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Abstract — In this paper, lower bounds to the probability of 
error in coding for discrete classical and classical-quantum 
channels are studied. Two main open problems are considered; 

i) the problem of bounding the probability of error in the low 
rate region for classical channels with a zero error capacity, and 

ii) the problem of bounding the probability of error for classical- 
quantum channels for all rates below the capacity. 

It is shown that these two problems are intimately connected 
and that, by studying these two types of channels in a compre- 
hensive analysis, the sphere-packing bound of Shannon, Gallager 
and Berlekamp and the Lovasz theta function find a unified 
presentation in terms of information radii. This brings together 
two aspects of coding theory that are usually considered to be 
of a different nature. The main goal of this paper is thus to 
give a unified picture of bounds to the reliability of classical and 
classical quantum-channels and of some combinatorial bounds 
to the zero-error capacity. 

To achieve this scope, the paper goes through some results 
which have their own importance in different sub-fields. The 
sphere-packing bound and the zero-rate bound of Shannon, 
Gallager and Berlekamp are extended to general classical- 
quantum channels, partially solving an open problem in quantum 
information theory. A new bound to the reliability of classical 
channels inspired by Lovasz 's construction and by Gallager 's 
expurgated bound is presented, introducing a function ■d{p) that 
varies from the cut-off rate of a channel to the Lovasz theta 
function as p varies from 1 to oo. It is then shown that this bound 
is a particular case of a more general bound to the reliability 
of a classical-quantum channel by means of the sphere-packing 
bound applied to auxiliary channels. Under this more general 
scenario, a quantity i3bp emerges which formally generalizes -& 
and is at least as good in bounding to the zero-error capacity. 

An interesting connection between the cut-off rate of a classical 
channel and the sphere-packing bound of a possibly underlying 
pure-state channel is obtained as a side result. 

Index Terms — Reliability function, sphere-packing bound, 
Renyi divergence, quantum Chernoff bound, classical-quantum 
channels, Lovasz' theta function, cut-off rate. 

I. Introduction 

This paper touclies some topics in sub-fields of information 
theory that are usually of interest to different communities. 
For this reason, it may be useful to introduce this work with 
an overview of the different contexts. 

A. Classical Context 

One of the central topics in coding theory is the problem 
of bounding the probability of error of optimal codes for 
communication over a given channels. In his 1948 landmark 
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paper ||2l, Shannon introduced the notion of channel capacity 
C, which represents the largest rate at which information 
can be sent through the channel with vanishing probability 
of error This means that, at rates strictly smaller than the 
capacity, communication is possible with a probability of error 
that vanishes with increasing block-length. In the following 
years, an important refinement of this fundamental result were 
obtained in |3l, H, ||5|. In particular, it was proved that the 
probability of error Pg for the optimal encoding strategy at 
rates below the capacity vanishes exponentially fast in the 
block-length, a fact that we can express as 

« e-^^(«) (1) 

where Pg is the probability of error, N is the block-length and 
E{R) is a function of the rate R, called channel reliability, 
which is positive (possibly infinite) for all rates smaller than 
the capacity. While Shannon's theorem made the evaluation 
of the capacity in a relatively simple way, determining the 
function E{R) soon turned out to be a very difficult problem. 

As Shannon himself first observed and studied ||6l, for a 
whole class of channels communication is possible at suf- 
ficiently low but positive rates R with probability of error 
precisely equal to zero, a fact that is usually described by 
saying that function E{R) is infinite at those rates. Shannon 
thus also defined a zero-error capacity Cq of a channel as 
the supremum of all rates at which communication is possible 
with probability of error exactly equal to zero. This problem 
soon appeared as one of a radically different nature from that 
of determining the traditional capacity. The zero-error capacity 
only depends on the confusability graph of the channel, and 
determining its value is usually considered as a problem of 
combinatorial nature rather than of a probabilistic one. As 
a consequence, since Co is precisely the smallest value of 
R for which E{R) is finite, it is clear that determining the 
precise expression for E{R) is expected to be a problem of 
exceptional difficulty. The first bounds on Cq were obtained by 
Shannon himself |6 |. In particular, he gave the first non-trivial 
upper bound in the form Cq < Cfb, where Cfb is the zero- 
error capacity with feedbacl|^ which he was able to determine 
exactly by means of a clever combinatorial approach. 

In the following years, works by Fano |8|, Shannon, Gal- 
lager and Berlekamp |9l, ifTOl were devoted to the problem 
of bounding the function E{R) for general discrete mem- 
oryless channels. The function could be determined exactly 
for all rates larger than some critical rate Rcrit, but no 
general solution could be found for lower rates, something 

'We avoid the subscript '0' in the feedback case since it is known that the 
ordinary capacity is not improved by feedback (6), (T). 
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that was however expected in light of the known hardness of 
even determining the value Co at which E{R) must diverge. 
An important result, based on large deviation techniques in 
probability theory, was the so called sphere-packing upper 
bound, that is, the determination of a function Esp{R) such 
that E{R) < Esp{R) for all rates R. The smallest rate 
for which Esp{R) is finite is clearly also an upper bound 
to Co, and it turned out, quite nicely, that R^o = CpB 
(whenever Cfb > 0). So, the same bound obtained by 
Shannon with a combinatorial approach based on the use of 
feedback could also be obtained indirectly from a bound based 
on a probabilistic argument. We may say that upper bounds 
to E{R) and upper bounds to Cq were in a sense "coherent". 
Not completely so, however, since channels with the same Co 
can have different Roo, which means that for some channels 
there was no upper bound to E{R) at rates where this quantity 
was already know to be finite. 

The situation remained as such until 1979, when Lovasz 
published his ground-breaking paper ifTTl . Lovasz obtained 
a new upper bound to Co based on his theta function d 
which, among other things, allowed him to precisely determine 
the capacity of the pentagon, the simplest graph for which 
Shannon was not able to determine the capacity. Lovasz's 
interest for this problem, however, came from a purely graph 
theoretic context, and his approach was combinatorial in 
nature, apparently very different from the probabilistic tech- 
niques previously used in channel coding theory. 

Lovasz's contribution is usually considered a clear indica- 
tion that bounding Cq is a problem that must be attacked 
with techniques developed under the context of combinatorics 
rather than under the probabilistic one. Links from the Lovasz 
theta function to classical information theory (see for example 
[fT2 |) have probably not been strong enough to avoid a progres- 
sive independent development of two research branches, one in 
the combinatorial direction, and the other in the coding theory 
direction in its classical more probabilistic and information 
theoretic shape. Lovasz's theta function was later recognized 
as a fundamental quantity in combinatorial optimization due 
to its relevant algebraic properties; to date, it is usually inter- 
preted in the context of semidefinite relaxation/programming 
and it is probably more used in mathematics and computer 
science than in information theory. A remarkable effect of 
this trend is the fact that, contrarily to what happened in the 
'60s, no advances in bounding E{R) for general channels 
with a zero-error capacity were made after the appearance 
of Lovasz's work. Perhaps, Lovasz's method was so much 
combinatorial oriented that it appeared not simple to exploit it 
in the probabilistic context within which bounds to E{R) were 
usually developed. Since 1979, thus, a "gap" exists between 
bounds to E{R) and bounds to Co. 

One of the main objective of this paper is to show that 
Lovasz's work and the sphere-packing bound of Shannon, 
Gallager and Berlekamp rely on a similar idea, which can 
be described in a unified way in probabilistic terms if one 
moves to the more general setting of quantum probability. 
The right context is that of classical-quantum channels; for 
these channels an equivalent definition of reliability function 
can also be given and lower bounds to this function have been 




Fig. 1. The importance of quantum probability as a tool for the study of 
channel reliability in the low rate region. The Lovasz theta function emerges as 
a natural consequence of the sphere-packing bound in the context of classical- 
quantum channels. Here Roo = Cfb (assuming C'fb > 0) refers to bounds 
derived in the classical setting. In the classical-quantum setting, channels exist 
with Roo = i9. 



obtained in ||T3]| . lfT4ll which parallel some of those known 
for classical channels. However, no upper bounds have yet 
been found for positive rates below the capacity. In this paper, 
we prove the sphere-packing bound for classical-quantum 
channels. As a result of this, we show that in the context of 
classical-quantum channels, the value i?oo at which Esp{R) 
diverges lead to an upper bound -dgp to the zero-error capacity 
that is at least as powerful as Lovasz's theta function. In other 
words, in the context of quantum probability, Lovasz's result 
emerges naturally as a consequence of the sphere-packing 
bound. Figure[T]gives a pictorial representation of the resulting 
scenario. This shows that classical-quantum channels provide 
the right context for making bounds to E{R) and bounds to 
Co coherent again at least to the same extent as they had been 
in the '60s. 

In this paper, however, we also propose an attempt to make 
a first step toward a real unification of bounds to E{R) and 
bounds to Co, which means that for any channel one has 
finite upper bound to E{R) for each R that is known to be 
larger than Co. There are different ways of attempting such a 
unification. This paper focuses on an approach inspired by a 
common idea in Lovasz's construction and in the expurgated 
lower bound of Gallager |15|. The resulting bound to E{R) is 
in many cases not tight in terms of the values it takes. It has 
however the nice property of being finite over the same range 
of rate values for all channels with the same confusability 
graph, and of making powerful bounds to Co a consequence of 
bounds to E{R). Furthermore, this approach reveals interest- 
ing connections between the Lovasz theta function, the cut-off 
rate of classical channels, the expurgated bound of Gallager 
and the rate R^o of classical-quantum channels. The resulting 
situation in the general case is qualitatively depicted in Figure 
[2] The bounds obtained in this paper are simply sketched in 
order to make clear that we do not claim any tightness. We 
believe however that the presented ideas shed some light on an 
unexplored path that deserves further study. The final objective 
of an investigation on this topic should be a bound, that we 
also symbolically show in Figure [2] that smoothly departs from 
the sphere-packing bound to diverge at i? = -dgp. We believe 
any result in this direction would be fundamental to coding 
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theory and combinatorics. 

/> E{R) ? 

-c- ^ !9,,p upper bound to Cg 

New family of 
upper bounds to E{R) 
(umbrella bounds) 

Desirable upper bound to E{R) 

. E,p{R) 
upper bound to E{R) 



Co S CpB Rcrit C R 

Fig. 2. Tlie umbrella bound.s to E{R) for a cla.ssical channel derived in this 
paper and an idea of the desirable final bound. The bound 'dsp is the upper 
bound to C'o that implied by the umbrella bound; we show that "Sgp < 
but we have not yet established if strict inequality is possible. A "desirable" 
upper bound to E(R) is also shown in the figure that smoothly coiTect the 
sphere-packing bound to meet i?sp; this should be target of future works in 
this direction. 



B. Classical-Quantum Context 

As already mentioned, classical-quantum channels play a 
central role in this paper. A number of results in the theory of 
classical communication through classical-quantum channels 
have been obtained in the past years that parallel many of the 
results obtained in the period 1948-1965 for classical channels 
(see IfTSl for a very comprehensive overview). As in the 
classical case, we are here primarily concerned with the study 
of error exponents for optimal transmission at rates below the 
channel capacity. Upper bounds to the probability of error of 
optimal codes for pure-state channels were obtained Bumashev 
and Holevo [ 13 1 that are the equivalent of the so called random 
coding bound obtained by Fano |8| and Gallager |15| and of 
the expurgated bound of Gallager [15| for classic channels. 
The expurgated bound was then extended to general quantum 
channels by Holevo |il4j. The formal extension of the random 
coding bound expression to mixed states is conjectured to 
represent an upper bound for the general case but no proof 
has been obtained yet (see ||T3| . (Ml). 

A missing step in these quantum versions of the classical 
results is an equivalent of the sphere-packing bound. This 
is probably due to the fact that a complete solution for 
the problem of the asymptotic error exponents in quantum 
hypothesis testing has been obtained only very recently. In 
particular, the so called quantum Chernojf bound was obtained 
in ifTTl . for the direct part, and in lilSl . for the converse part 
(both results were obtained in 2006, see |[T9l for an extensive 
discussion). 

Those two works also essentially provided the basic tools 
that enabled the solution of the so called asymmetric problem 
in II20I . IIT9I , where the set of achievable pairs of error 



exponents for the two hypotheses are determined. This result 
is usually called Hoeffding bound in the quantum statistic liter- 
ature. The authors in 1 19J attribute the result for the classical 
case also to Blahut II2TI and Csiszar and Longo ||22| . It is 
the author's impression, however, that the result was already 
known much before, at least among information theorists at 
the MIT, since it is essentially used in Fano's 1961 book |[8| 
(even if not explicitly stated in terms of hypothesis testing) 
and partially attributed to some 1957 unpublished seminar 
notes by Shannon (see also |4| for an example of Shannon's 
early familiarity with the Chernoff bound) . A more explicit 
formulation in terms of binary hypothesis testing is contained 
in 19], ifTOl in a very general form, which already considers 
the case of distributions with different supports (compare with 
ETI and see for example [19. Sec. 5.2]) and also provides 
results for finite observation lengths and varying statistical 
distributions (check for example fTO" Th. 1, pag. 524]). This 
is in fact what is needed in studying error exponents for 
hypothesis testing between different codewords of a code for 
a general discrete memoryless channel. 

The main difference in the study of error exponents in 
binary hypothesis testing contained in II2TI and ll22l is that 
those papers focus more on the role of the Kullback-Leibler 
discrimination (or relative entropy), which can be used as 
a building block for the study of the whole problem in 
the classical case. In I'^l, flOl, instead, what is now known 
as Renyi divergence was used as the building block. The 
two approaches are equivalent in the classic case, and it 
was historically the presentation in terms of the ubiquitous 
Kullback-Leibler divergence which emerged as the preferred 
one as opposed to the Renyi divergence. Along this same line, 
a simpler and elegant proof of the sphere-packing bound, again 
in terms of the Kullback-Leibler divergence, was derived by 
Haroutunian |23] by comparing the channel under study with 
dummy channels with smaller capacity. This proof, which is 
substantially simpler than the one presented in |9|, was then 
popularized by ||24| . and became the preferred proof for this 
classic result. 

As a matter of fact, however, as pointed out in l20l Sec. 4, 
Remark 1] and |fT9l Sec. 4.8], it turns out that the solution to 
the study of error exponents in quantum hypothesis testing can 
be expressed in terms of the Renyi divergence and not in terms 
of the Kullback-Leibler divergence. Thus, since the sphere- 
packing bound is essentially based on the theory of binary 
hypothesis testing, it is reasonable to expect that Haroutunian's 
approach to the sphere-packing bound may fail in the quantum 
case. This could be in our opinion the reason why a quantum 
sphere-packing bound has not been established yet. 

In this paper, we propose a derivation of the sphere-packing 
bound for classical-quantum channels by following closely the 
approach used in jS), The quantum case is related to the 
classical one by means of the Nussbaum-Szkola mapping 1 18|, 
that represented the key point in proving the converse part of 
the quantum Chernoff bound (see 1 19 1 for more detailes). This 
allows us to formulate a quantum version of the Shannon- 
Gallager-Berlekamp generalization of the Chernoff bound 
Th. 5] on binary hypothesis testing (in the converse part). The 
proof of the sphere-packing bound used in ||9] will then be 
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adapted to obtain the equivalent bound for classical-quantum 
channels. This proves the power of the methods employed 
in |9|. The mentioned generalization of the Chernoff bound, 
furthermore, allows to adapt the technique used in ifTOl to find 
an upper bound to the reliability at i? = 0, which leads to an 
exact expression when combined with the expurgated bound 
proved by Holevo llT4l . 

C. Paper overview 

The paper is structured as follows. In section|ll]we introduce 
the notation and the basic notions on classical and classical- 
quantum channels, and on the main statistical tools used in 



this paper. In section III we introduce what we call "umbrella 
bound" in its simplest and self-contained form, as a preview 
of what will then be done in the context of classical-quantum 
channel. The scope of this section is to show how Lovasz's 
idea can be extended to bound the reliability function E{R) 
at all rates larger than the Lovasz theta function. This section 
also prepares the reader to the interpretation of Lovasz's 
representations as auxiliary channels, and it points out an 
interesting connections between the Lovasz theta function, the 
cut-off rate and the expurgated bound. 

We then start considering bounds for classical-quantum 



channels. In section IV we develop a fundamental bound to the 
probability of error in a binary decision test between quantum 
states. This bound is a quantum version of the converse 
part of the Shannon-Gallager-Berlekamp generalization of the 
Chernoff bound |9|. In Section [V] this tool is used to prove 
the sphere-packing bound for classical-quantum channels. The 
proof of the bound follows the approach in |j9|, which contains 
the key idea that is also found in Lovasz's bound. Part of the 
results presented in Sections IV and |V] were first presented 
in jT]. In Section VI we provide a detailed analysis of the 
analogy between the two bounds. In doing so, we generalize 
a result of Csiszar |25] , showing that the quantum sphere- 
packing bound can be written in terms of an information 
radius which appears to be the true leading theme that puts the 
Lovasz theta function, the cut-off rate, the rate i?oo and the 
ordinary capacity C under the same light. This section also 
presents a possible extension -dgp of Lovasz's theta function, 
that naturally arises from the sphere-packing bound, whose 
true potentiality is however still to be understood. In section 



VII we provide a more detailed analysis of classical and pure- 
state channels, proving a nice connection between the cut-off 
rate of a classical channel and the rate R^o of a pure-state 
channel that could underlay the classical one. This leads as a 
side result to an interesting expression for the cut-off rate as 
a minimum maximum eigenvalue. 

In Section |VIII| then, we reconsider the umbrella bound 



anticipated in Section III giving it a more general and gen- 



erally more powerful form that allows to bound the reliability 
function E{R) of a classical-quantum channel C in terms of 
the sphere-packing bound Esp{R) of an auxiliary classical- 
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we consider the 



quantum channels C. Finally, in Section 
special case of channels with no zero-error capacity, for which 
we present the quantum extension of some classical bounds. 
In particular, we show that the zero-rate bound in ifTOl can 



be extended to classical-quantum channels by means of the 
results of section |IV] A bound of Blahut is then discussed and 
revised in its general applicability both in the classical and 
classical-quantum channels. 

II. Basic Notions and Notations 

The choice of the notation is difficult since different ref- 
erences are used that adopted different notations, sometimes 
using the same symbol with a different meaning For this 
reason, we prefer to present the choice of notation in detail 
while discussing the basic results on classical and classical- 
quantum channels, on the used divergences and on the zero- 
error capacity. Since the main technical contributions derived 
in this paper is based on |9|, we prefer to follow as closely as 
possible the notation used there. For the quantum part we try 
to use the notation of lfT4l and of lfT9l . Readers familiar with 
the possible different notations used in the literature may skip 
Sections ULAIULD] 

A. Classical Channels 

We introduce here the required basic notation and results 
on classical channels (see 1261 . ||271 . Q for more details). 

Let P{j\k), 1 < j < J, 1 < fc < if, be the tran- 
sition probabilities of a discrete memoryless channel with 
input alphabet (1,2,..., A') and output alphabet 1, 2 . . . , J. If 
X = (fci, k2, ■ ■ ■ , k]^) is a sequence of N input symbols and 
correspondingly y = {ji,j2, ■ ■ ■ jJn) is a sequence of output 
symbols, then the probability of observing y at the output of 
the channel given x at the input is 



N 



P(y|x) = n^O"|fc" 



(2) 



A block code with parameters M and is a mapping from a 
set {1,2,..., M} of M messages onto a set {xi, X2, . . . , xm} 
of AI sequences each composed of N symbols from the input 
alphabet. The rate R of the code is defined as i? = log M/N. 
A decoder is a mapping from the set of lenght-A^ sequences 
of symbols from the output alphabet into the set of possible 
messages {1, 2, . . . , A/}. If message m is to be sent, the 
encoder transmits the codeword x,„ through the channel. An 
output sequence y is received by the decoder from the channel, 
which maps it to a message m. An error occurs if to 7^ to. 

Let Y„i be the set of output sequences that are mapped to 
the message m. When message m is sent, the probability of 
error is 



P. 



E 



P(y|x„ 



(3) 



The maximum error probability of the code is defined as the 
largest Pe\m^ that is. 



P. 



maxP, 



(4) 



^For example, the parameter s in 1 14] is used with a different meaning 
than in |9| and |I9|. In |I3| and in |14|, non-standard names are used for 
the functions involved in the random coding and in the expurgated bounds. 
In I II I, a non logarithmic definition of the i9 function is used which makes 
comparison with other information theoretic quantities less transparent. In 
1251 . a different convention is used for the Renyi's divergence than in (9). 
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Let Pi,max(^) be the minimum maximum error probability 
among all codes of length N ans rate at least B. Shan- 
non's theorem states that sequences of codes exists such that 
-Pi,max — > as iV — > oo for all rates smaller than a constant 
C, called channel capacity, which is given by the expression 



C = maxVp,P(j|A;)log^^^'^ 



pm 



(5) 



where the maximum is over all probability distributions on the 
input alphabet. For rates R < C, Pi.max(P) is known to have 
an exponential decrease in N , and it is thus useful to define 
the reliability function of the channel as 



E{R) 



lim sup 



^ i e.ma: 



(6) 



It is known that the same function E{R) results if in (|6| one 
substitutes Pi,max with the minimum average probability of 
error Pi^^ defined as the mimium of Pg = Pe,m over 

all codes of block length N and rate at least R (see for example 

aa, my 

For almost all channels, the function E{R) is known only 
in the region of high rates. The random coding lower bound 
to E{R) is given by the expression 



E(R) > max \Eo(p) ~ pR] 

0<p<l 

Eq{p) = ma.xEo{p,p) 
p 

J / K \ 

Eo{p,p)^-\ogj2(J2p>'P{j\ky^^'^'^ 
j=i \k=i / 

This bound is tight in the high rate region. This is proved by 
means of the sphere-packing upper bound, which states that 
E{R) < Esp{R), where 



E,piR)= sup [Eo{p)-pR] 

p>0 



For those rates R for which Esp{R) is achieved by a p < 1, the 
two bound coincides and thus determine exactly the reliability 
function. In many cases there is a rate Rcrit, called critical 
rate, for which Esp{R) is achieved by p = 1 and this leads to a 
precise determination of E{R) in the interval Rcrit < R< C. 

In the low rate region, two important different case are to be 
distinguished. If there is at least one pair of inputs k and i such 
that P{i\k)P{j\i) = Vj, then communication is possible 
at very low rates with probability of error exactly equal to 
zero. In this case we say that the channel has a zero-error 
capacity Cq, which is the supremum of all rates R for which 
pI%1^{R) = for some N . Otherwise, the channel has no 
zero-error capacity, and the probability of error, though small, 
is always positive. An improvement over the random coding 
bound is given by Gallger's expurgated bound, which states 



that E(R) > Ecx{R) where 

Pe.(P) =sup [Ex{p)-pR] 

P>1 

Ex{p) = maxP^(p,p) 



i/p 



Ex{p,p)^~p\ogY,PkP^ E^^pCtWHtW 

k,i \ 3 

In the low rate region, known upper bounds differ substantially 
depending whether the channel has a zero-error capacity or 
not. The function Esp{R) goes to infinity for rates R smaller 
than the quantity 



R„ 



max - 
p 



' log max 

^ k:P(j\k)>Q 



Pk 



(7) 



which is in the general case larger than Cq, even in cases where 
Co = 0. No general improvement has been obtained in this 
low rate region over the sphere-spacking bound in the general 
case of channels with a zero-error capacity. For channels 
with no zero-error capacity, instead, a major improvement was 
obtained in |(9|, II OJ . where it is proved that the expurgated 
bound is tight at P = and that it is possible to upper bound 
E{R) by the so called straight line bound, which is a segment 
connecting the plot of Eex{R) at P = to the function 
Esp{R) tangentially to the latter For the specific case of the 
binary symmetric channel, furthermore, even more much more 
powerful bounds are available 1281 . ll29l . 

B. Classical-Quantum Channels 

We introduce here the basic notions and results on classical- 
quantum channels that will be needed in this paper. For 
introduction to the topic the reader may refer to ll30l ||3TI 

Following fT4l, consider a classical-quantum channel with 
an input alphabet of K symbols {!,..., K} with associated 
density operators Sk, k = 1,...,K in a finite dimensional 
Hilbert space H. The iV-fold product channel acts in the 
tensor product space "H®^ of N copies of H. To a code- 
word w = {ki,k2, ■ ■ ■ ,ki\f) is associated the signal state 
Sw = Ski ^ • • • (8 Skj^ ■ A block code with M codewords 
is a mapping from a set of M messages {!,..., Af} into 
a set of M codewords wi,...,wm- A quantum decision 
scheme for such a code is a collection of M positive operators 
{El, 112, • • • , Ha/} such that ^ 11^ < 1. The rate of the code 
is defined as 

log M 



R = 



N 



(8) 



The probability that message to' is decoded when message 
TO is transmitted is P(to'|to) — Trn„j'Sw,„. The probability 
of error after sending message ni is 

Pe^„ = 1 - Tr (n„,Sw,J . (9) 
We define the maximum probability of error of the code 

Pe,max — HiaxPg (10) 

m 

For any positive R and integer N, we define Pi,max(P) as the 
minimum maximum error probability over all codes of block 
length and rate at least R. 
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For rates R smaller than the capacity of the channel, 
Pe^maxiR) gocs to zcro exponentially fast in N. The reliability 
function of the channel is defined as 

E{R) = lim sup - ^ log Pi.^L (fi) ■ (11) 

A pure-state channel is one with all density operators Sk of 
rank one, in which case we write Sk — \ipk){tpk\- If, on the 
other hand, all density operators Sk commute, then they are all 
simultaneously diagonal in some basis. In this case it is easily 
proved that the optimal measurements are also diagonal in the 
same basis and, thus, the classical quantum channel reduces 
to a classical one. Each classical channel can then be thought 
of as a classical-quantum one with all commuting operators 
Sk- 

Lower bounds to the reliability function of a pure-state 
channel where obtained by Burnashev and Holevo L131 who 
extended the random, the expurgated and the zero-rate bounds 
to this case. The random coding lower bound can be stated as 
in the classical case with the only modification 



£^o(p,p)--logTr 



VkSk 



and the expurgated bound, as well, can be stated ans in the 
classical case with the modification 



|2/P 



The expurgated bound has then been extended in fT?) to 
general channels, and in this case we have 



Exip.'p) = -pXog^pkPi [ti \/~S~k\fS, 



i/p 



but no extension of the random coding bound has been 
obtained yet. To the best of this author's knowledge, the best 
currently available lower bound was obtained in 1321 . 

C. Zero-Error Capacity 

Both for classical and classical quantum case, we can define 
the zero-error capacity Co of the channel as the supremum of 
all rates R for which Pe%lax{R) = is possible for some 
N . The zero-error capacity only depends on the confusability 
graph G of the input symbols and, for this reason, one often 
speaks of the zero-error capacity as the capacity of a graph. 
In the classical case, symbols k and i are confusable if and 
only if P{j\k)P{j\i) > for some j. In the classical-quantum 
case, they are confusable if and only if TT{SkSi) > 0. The 
confusability graph G is a graph with vertices indexed by the 
input symbols of the channel, where two vertices are connected 
by an arc in the graph if they can be confused. The fact that the 
zero-error capacity only depends on the confusability graph 
is obvious in the classical case and easily seen also in the 
classical-quantum case. In fact, if a code with M codewords 
satisfies Pe.max = 0, then for each m 7^ m' we must have 
Tr(n,„Sw,J ^ 1 and Tr(n,„Sw„,) = 0. This is possible if 
and only if signals Sw„ and Sw , are orthogonal. This in turn 
implies that there is some i such that w,„ and w.,„' contain. 



in their i-th component, non-confusable symbols in the above 
sense. 

Finding the zero-error capacity remains an unsolved prob- 
lem. As mentioned before, a first upper bound to Cg was 
obtained by Shannon by means of an argument based on 
feedback. He could prove that the zero-error capacity, when 
feedback is available, is given by the expression 



Cfb = max — log max > 
p j ^ — ' 

fc:P(i|fc)>0 



Pk 



(12) 



whenever Co > 0. The expression above is precisely the value 
Roo at which the sphere-packing bound diverges (whether 
Co > or not). The best bound is then obtained by using the 
channel (with the given confusability graph) which minimizes 
the above value. 

A major breakthrough was obtained by Lovasz in terms of 
his theta function. He could prove that Co < I?, wher^ 

'd = min min max log — — ^ , ,^ 

{Uk} c k |(ufc|c)|2 

where the outer minimum is over all sets of unit norms vectors 
in any Hilbert space such that Uk and Ui are orthogonal if 
symbols k and i cannot be confused, the inner minimum is 
over all unit norm vectors c, and the maximum is over all input 
symbols. See also IIT2I and lf33l for interesting comments on 
Lovasz's result. 

Shortly afterwards, Haemers ll34l . ||35]| obtained another 
interesting upper bound to Co, which he proved to be strictly 
better than z9 in some cases. Haemers bound asserts that Co 
is upper bounded by the rank of any K x K matrix A with 
elements Ai^i 7^ and Ak,i = if fc and i are non-confusable 
inputs. This bound is in many cases looser than Lovasz's one, 
but Haemers proved it to be tighter for the graph which is the 
complement of the so called Schldfli graph. 

Recently, an extension of the Lovasz theta function for the 
quantum communication problem has been derived in P9]| 
which is based on algebraic properties satisfied by the Lovasz 
theta function. To the best of this author knowledge, the 
study of the zero-error capacity that we consider for classical- 
quantum channels is not readily put in relation with the 
results of |49|. There, the authors propose there an extension 
for general quantum channels based on what they call non- 
commutative graphs, while we derive some variant of the 
Lovasz theta function even if always considering classical con- 
fusability graphs in this paper. Further work would however be 
required to understand if there is a possible unified framework 
for the bounds to E{R) (and thus to Co) derived here and the 
results of P9]|. 



D. Distances and Divergences 

In this paper, a fundamental role is played by statistical 
measures of dissimilarity between probability distributions 
and between density operators. This section defines the used 
notation and reminds the properties of those measures that will 
be needed in the rest of the paper. 

^We use a logarithmic version of the original theta function as defined by 
Lovasz, so as to avoid logarithms in the formulation of the bounds. 



7 



In classical binary hypothesis testing between two probabil- 
ity distributions p and mathbfq, a fundamental role is played 
by the function /ip,q(s) defined by 

Mp,q(s) = "^ogJ^Pl^'ll < s < 1 (13) 
k 

and 

Mp,q(0) = lim /^p.q(s) and /ip,q(l) = lim /ip,q(s). (14) 

The minimum value of /ip,q(s) in the interval [0, 1] is of 
importance for the study of symmetric binary hypothesis 
testing and it is convenient to introduce the Chernoff distance 
dciPil) between the two probability distributions p and q, 
which is defined by 

dc{p,q)^- min fip.q{s), (15) 

0<s<l 

It will also be useful later to discuss the relation between the 
Chernoff distance and other distance measures. Of particular 
importance are the Batthacharyya distance, that we define here 
as 

dB(p,q) = -A'p.qCl/Z) (16) 
= -log^VPMZfe, (17) 

k 

It is known that the function /Ltp_q(s) is a convex function of 
s and from this the following inequalities are deduced 

dsip, q) < dciV: q) < 2dB(p, q). (18) 

Examples are easily found showing that both equalities are 
possible. 

We now introduce the corresponding quantities for the 
quantum case, that is when two operators g and g are to be 
distinguished in place of the two distributions p and q. The 
function iJLg,,;{s) is defined by 

^i,,,{s) = logTr^?l-\^ < s < 1 (19) 

and 

Me.'jlO) = lim ^g.^(s) and //^^^(l) = lim ^^^^(s). (20) 

The Chernoff and the Bhattacharrya distances dc{g,<;) and 
dsiQ,^) between the two density operators g and <; are then 
defined as in the classical case by 

dc{g,'^) = - min fig,^{s), (21) 

0<s<l 

and 

dsig,^) = -Me.^(l/2) (22) 
= -logTr^V?, (23) 

and they are again related by the inequalities 

dsig,'^) <dc{g,'i) <2dB{g,<i). (24) 

In the quantum case, however, another important measure of 
the difference between two quantum states is the so called ^- 
delity between the two states, which is given by the expression 



Tr I Here, it will be useful for us to adopt a logarithmic 

measure and defin^ 

dpig,^) = -logTriy^V^I (25) 
= -logTr^y^^. (26) 

It is known that dp{g, <j), is related to dc{g, and dsig, <?) 
by the following inequalities 

dF{g,<i) < dB{g,<;) < dc[g,<;) < 2dF[g,<;) < 2dB{g,^) 

(27) 

Both the conditions dc{g,<i) — dp{g,(;) and dc{g,<^) = 
2dB{g,<i) are possible for properly chosen density operators 
g and 

From this point on, we may omit the subscript g and and 
simply write /^(s) for fJ.g^,;{s) when there is no ambiguity on 
the density operators that we are considering. 

E. Notation 

The following list summarizes the notation used in this 
paper 

K is the size of the input alphabet of a channel, 

{1,2,...,K} here; 
J is the size of the output alphabet of a classical 

channel, {1,2,..., J} here; 
P{j\k) is the transition probability from input symbol k to 

output symbol j of a classical channel; 
N is the block length of the code; 
M is the number of codewords, i.e., the size of the code; 
R is the rate; 

w is a codeword ki,...,k]y of N input indices. 

Wi . . . , wm are the codewords of the code; 
p is a probability distribution {pi,...,pk) on the 
input; alphabet. This can for example represent the 
composition of the codewords; 
Pg is the probability of error. Pe\m is the probability of 
error for codeword m, Pe.max is the maximum of 
these values over all codewords; 
E{R) is the true reliability function such that « 

Esp{R) is the sphere-packing bound such that E{R) < 
Esp{R^); 

C is the ordinary capacity of a channel; 
Co is the zero-error capacity; 

Rp is the "generalized cut-off rate" in the sense of (3^. 
Rq = C, Ri is the cutoff rate, i?oo is the point at 
which the sphere-packing bound diverges; 

g, c are two general density operators; 

p, a are operators such that, for a particular case of g and 
<r, we can write g = p'^^ and — cr'*^; 
H is the Hilbert space of signal states for a classical- 
quantum channel; 
Sk (for k — 1, . . . , K) is the signal state in H associated 
to input k ; 

"^Usually the quantity 2(1 — Tr|.y/gY^|) is called Bures distance. We 
use the notation dp. with F for fidelity, to avoid ambiguities with the 
Bhattacharyya distance. 
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\ipk) is the vector associated to input symbol fc in a pure- 
state channel. Also used for representing probability 
distributions of a classical channel by means of unit 
norm vectors.; 
Sw is the signal associated to w for a classical-quantum 
channel, that is Sw ~ Sk^ Sk^ ■ ■ ■ ® S'fe^ ; 
n is an operator used for a quantum decision measure- 
ment; 

n„i is the operator associated to the decision for decod- 
ing w,„.; 

/ is an auxiliary output state used as a reference, 
playing the same role of f in [9|. With some abuse 
of notation, we also use this symbol for the vector 
which represents the handle of a representation in 
the Lovasz sense. This is done on purpose to em- 
phasize the connection between the two quantities; 
s is the parameter in the Renyi divergence as used in 
0, Ipl, see below; 
/i(s) is the Renyi divergence between distributions or 
density operators, depending on the context. When 
necessary, the two distributions or operators are 
explicitly indicated, as in /i/.g(s) = log Tr /^^'^g''; 
dc{-,-) is the Chernoff distance; 
dB{-,-) is the Bhattacharyya distance; 
dpi', •) is the distance based on the fidelity; 

p is the parameter s/(l — s) as defined in |15| (called 
s in HMD; 

III. A Preview: an "umbrella" bound 

In this section, we present an upper bound to E{R) for 
classical channels that can be interpreted as an extension of 
Lovasz's work in the direction of giving at least a crude upper 
bound to E{R) for those rates that Lovasz's own work proves 
to be strictly larger than the zero-error capacity. The intent, 
however, is to obtain Lovasz's bound as a consequence of an 
upper bound on E{R) and not viceversa. The obtained bound 
on E{R) is loose at high rates, but it has two important merits. 
First, it makes immediately clear how Lovasz' idea can be 
extended to find an upper bound to E{R) that will give Co < 
1? as a direct consequence. Second, it reveals an important 
analogy between the Lovasz theta function and the cut-off 
rate. We will in fact introduce a function ^{p) that varies from 
the cutoff rate of the channel, when p = 1, to the Lovasz 
theta function, when p — > oo. The idea is to keep Lovasz' 
construction in mind as a target but building it as the limit 
of a smoother construction. This construction is related to the 
construction of Gallager's expurgated lower bound to E{R), 
but is used precisely in the opposite direction. We believe 
these analogies could shed new light to the understanding of 
the topic and deserve further study. 

A. Bhattacharyya distances and scalar products 

In deriving the desired bound, we will start our interpreta- 
tion of classical-quantum channels as auxiliary mathematical 
tools for the study of classical channels. Contrarily to what 
may be considered the most traditional approach, however, we 
will not interpret our channel's transition probabilities as the 



eigenvalues of positive semidefinite commuting operators. We 
will instead consider the transition probabilities as the squared 
absolute values of the components of some wave functions, 
and it is thus more instructive to initially consider only pure- 
state channels. In this direction, we also need to recall briefly 
some important connections between the reliability function 
E{R) and the Bhattacharyya distance between codewords. 
This connection is of great importance since the Bhattacharyya 
distance between distributions is related to a scalar product 
between unit norm vectors in a Hilbert space. It is this property 
that makes an adaptation of Lovasz's approach to study the 
function E{R) possible. 

For a generic input symbol fc, consider the unit norm vector 

^fe = (v/i^,y^^,...,yp(jTfc))^ (28) 

of the square roots of the conditional probabilities of the 
output symbols given input fc. We call this the state vector 
of input symbol fc, in obvious analogy with the input signals 
of pure-state classical-quantum channels. Consider then the 
memoryless A'^-fold extension of our classical channel, that 
is, for an input sequence x = (fci, fc2, . . . , fcAr), consider 
the square root of the conditional probability of a sequence 

y = (ji,j2, ■ • -Jn) 

N 

\/P(y|x) = n VPUnlkn) (29) 

If all y sequences are listed in alphanumeric order, we can 
express all the square roots of their conditional probabilities 
as the components of the vector 

*x = V'fel ® V'fe^ ■ ■ ■ V'fcjv (30) 

We call this vector the state vector of the input sequence 
X, again in analogy with classical-quantum channels. Let for 
simplicity 'i'm be the state vector of the codeword x™; then 
we can represent our code {xi, X2, . . . , Xj;/} by means of their 
associated state vectors {^I'l, vl'2, . . . , ^^m}- Since all square 
roots are taken positive, note that our classical channel has a 
positive zero-error capacity if and only if there are at least two 
state vectors tpi, ipk such that {tpili/jk) = 0. This implies that 
codes can be built such that (^'ml^m') = for some m, m', 
that is, the two codewords m and m' cannot be confused at the 
output. However, the scalar product (^ml^'m') plays a more 
general role since it is related to the so called Bhattacharyya 
distance between the two codewords m and m'. In particular, 
in a binary hypothesis testing between codeword m and m', 
an extension of the Chernoff Bound allows to assert that the 
minimum error probability asymptotically satisfies |10| 

log -1 « _ log^min^ Piy\^my-'P{y\^rn'Y (31) 

It is easily shown that this quantity is always between 
-log{'^rn\^m') and -21og(*m|vI'„i/), and it equals the for- 
mer for a class of channels, called pairwise reversible channels, 
that have some symmetry with respect to the input symbols 
(which tautologically means that the minimum is achieved for 
s — 1/2). Obviously, for a given code, the probability of error 
Pe,ma.x is lower bounded by the probability of error in each 



9 



binary hypothesis test between two codewords. Hence, we find 
that 

log— ^ < min -21og(^'™|*„,-) (32) 



and, for pairwise reversible channels 



log 



1 



< 



P. 



mill -log(5'„|^', 



(33) 



It is thus obvious that it is possible to upper bound E{R) by 
lower bounding the quantity 



7= max(^'„|*,; 



(34) 



Lovasz's work aims at finding a value i? as small as possible 
that allows to conclude that, for a set of M — e"^ > e"'' 
codewords, 7 cannot be zero, and thus at least two codewords 
are confusable. Here, instead, we want something more, that 
is, finding a lower bound on 7 for each code with rate R > {)l 
so as to deduce an upper bound to E{R) for all R > -d. 

B. The umbrella bound 

Consider the scalar products between the channel state 
vectors ^ 0- For a fixed p > 1, consider then 

a set of K "tilted" state vectors, that is, unit norm 
vectors 4'ii4'2, ■ ■ ■ ,4'k in any Hilbert space H such that 
KV^il-i/jj-)! < {ipi\ipj)^^P. We call such a set of vectors {4'k} 
an orthonormal representation of degree p of our channel, 
and call T{p) the set of all possible such representations 



(35) 



Observe that V{p) is non-empty since the original i/ij. vectors 
satisfy the constraints. The value of an orthonormal represen- 
tation is the quantity 



y{{i^k)) = minmaxlog ~ 

f ^ \{fm)v 



(36) 



where the minimum is over all unit norm vectors /. The 
optimal choice of the vector / is called, with Lovasz, the 
handle of the representation. We call it / to point out that this 
vector plays essentially the same role as the auxiliary output 
distribution f used in the sphere-packing bound of |j9J, a role 
that will be played by an auxiliary density operator / later on 
in Section W\ 

Call now ^{p) the minimum value over all representations 
of degree p. 



d{p) = mill y({^fe}) 



_ min min max log — = 

{^k}eT{p) f k |(V'fe|/)P 



(37) 
(38) 



This function ^{p) allows us to describe an upper bound to 
E{R) for each value of p, that we call umbrella bound. Later in 



this paper, in Section VIII we will interpret this bound from a 
different perspective, and we will introduce an evolution based 
on the sphere-packing bound. We have the following result. 



Theorem 1: For any code of block-length N with M code- 
words and any p > 1 we have 

ma. > J (39) 

Corollary 1: For the reliability function of a general DMC 
we have the bound 



E{R)<2pd{p), R>d{p), 



(40) 



If the channel is pairwise reversible, we can strengthen the 
bound to 

E[R) < p^{p), R > (41) 

Proof: Note then that, for an optimal representation 
of degree p with handle /, we have |(V-'A;|/)P > e^'''^'''', 
Vfc. Set now F = /"^^ and for an input sequence x = 
(fci, fc2, . . . , /cat) call, in analogy with ( |30l ), ^'x = i^kx®i^k2 ® 
■ ■ ■ 4'kN- Observe that we have 



N 



\{^^\F)\' = llKi'kJf)? 



n=l 



(42) 
(43) 



This is the key step which is central to both Lovasz's approach 
and to the sphere-packing bound: the construction of an 
auxiliary state which is "close" to all possible states associated 
to any sequence. In this case the states are close in terms of 
scalar product, while in the sphere-packing bound they will 
be close in terms of the more general Renyi divergence. The 
basic idea, however, is not different. 

Let us first check how Lovasz's bound is obtained. Lovasz's 
approach is to bound the number M of codewords with orthog- 
onal state vectors, using the property that if ^1, • • • 
form an orthonormal set, then 



1 = 



m 



(44) 
(45) 

(46) 



Hence, if M > e^^'^^'K there are at least two non-orthogonal 
vectors in the set, say |(^TO|4',n')P > 0. But this implies 
that I 14-^01' > K^ml^-m')!'" > . Hence, if i? > 
i?(p), no zero-error code can exist. We still have the freedom 
in the choice of p and it is obvious that larger values of p 
can only give better results. Hence, it is preferable to simply 
work in the limit of p — > 00 and thus build the representation 
'02, • ■ • under the only constraint that K^AilV'i) I — 
whenever — 0- This gives precisely Lovasz' result. 

Now, instead of bounding R under the hypothesis of zero- 
error communication, we want to bound the probability of 
error for a given R > d{p). Considering the tilted state vectors 
of the code, we can write 



> 



(^^1 (|*™)(*. 



(47) 
(48) 



The second expression above has the benefit of easily allowing 
averaging this expression over different codewords. So, we can 



10 



average this expression over all m and, defining the matrix 
* = (*i,...,§m) we get 

(49) 



Since is a unit norm vector, this implies that the matrix 
has at least one eigenvalue larger than or equal to 
g-7Vi?(p) 'pjyg jjjj-jj jjjjpjjgs if^^i ajso the matrix has 

itself an eigenvalue larger than or equal to e^^'^^''\ that is 



A. 



(50) 



It is known that for a given matrix A, the following inequality 
holds 

Amax(^) <maxV|A,,j |. (51) 



Using this inequality with A = "^^^ we get 

M 



< max 

j 



M 

We then deduce 

1 



(52) 



(53) 



(54) 



< max 



^_^5:(vE-,|*,y/p (55) 

1/P 



< max 



M 



(56) 



where the last step is due to the Jensen inequality, since 
p > I. Extracting the sum from this inequality we obtain 
the inequality stated in the theorem. 
To prove the corollary, simply note that 

max(vl/, I vl/^. ) > max — ^ J] | | ) | (57) 

(58) 



> 



> e 



M-1 



(59) 



The bound is trivial if i? < i}{p). If R > d{p), we 
deduce again Lovasz's result that there are two non-orthogonal 
codewords. But now we also have some further information; 
for R > •d{p), the second term in the parenthesis decreases 
exponentially faster than the first, which leads us to the 
conclusion that 



1 . 1 1 

— mm log — — — - 



<pd{p) + o{l). 



(60) 



The bounds in terms of E{R) are then obtained by simply 
taking the limit N ^ oo and using the bounds ([32]) and ( |33| ) 

Remark 1: In passing from the theorem to the corollary, 
we have essentially substituted the maximum Bhattacharyya 



distance between codewords for the largest average distance 
from one codeword to the remaining ones. The reason for 
doing this is that we are unable to bound E{R) more efficiently 
in terms of the average distance than in terms of the maximum 
distance although intuition suggests that it should be possible 
to do it. This is related to the tightness of the union bound; it 
is my belief that this step is crucial and that improvements in 
this sense could give important enhancements. 



C. Relation to known classical quantities 

We now study the behaviour of t?(/9) for different values of 
p. A first important comment is about the result obtained for 
p = 1; the value -dil) is simply the cut-off rate of the channel. 
Indeed, for p ~ 1, we can without loss of generality use 
the obvious representation -0^ = ifj^yk, since any different 
optimal representation will simply be a rotation of this (or an 
equivalent description in a space with a different dimension). 
In this case, all the components of all the vectors {^fe} are 
non-negative and this easily implies that the optimal / can as 
well be chosen with non-negative components, since changing 
a supposedly negative component of / to its absolute value can 
only improve the result. Thus, / can be written as the square 
root of a probability density Q and we have 



min max log ,, , 

K 



mm max 

Q k 




(61) 
(62) 



where the minimum is now over all probability distribution Q. 
As observed by Csiszar ||25;, Proposition 1, with a = 1/2], this 
expression equals the cut-off rat^ i?i of the channel defined 



Ri = max-log^pfcp, ^ ^ P{j\k)P{i\i 

i,k \ j 

= max-log^pfcPi(V'fclV'i) 

The identity = Ri will be discussed again later in light of 
the new interpretation that we will give of t?(/9) after studying 
the sphere-packing bound. We will see that it represents a 
nice connection between a classical channel and a pure-state 
classical-quantum channel possibly underlying the classical 
one. 

Another important characteristic of the function 'd{p) is 
observed in the limit /? — > oo. In the limit, the only con- 
straint on the representations is that KV'ilV'i)! — whenever 
I (i/'il^Aj) I — 0. Hence, when p ^ oo, the set of possible 
representations is precisely the same considered by Lovasz 
ifTll . and we thus have ^{p) — > i9 as p — > oo. So, the value 
of moves from the cut-off rate Ri to the Lovasz bound 
d when p varies from 1 to oo. This clearly implies that our 

^We use the notation R\ for the cut-off rate, instead of the more common 
Rq , for notational needs that will become clear in Section IVII 
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bound to E{R) is finite for all i? > i? and thus it allows to 
bound the zero-error capacity of the channel as 



case (which is however trivial) and, if Co = 0, the bound 



Co 



< lim -dip) 

p—^OO 

= d. 



(63) 
(64) 



In order to understand what happens for intermediate values, 
is is instructive to consider first a class of channels introduced 
by Jelinek fJTl and later also studied by Blahut |38|. These 
are channels for which the matrix C with element 
^ij — is positive semidefinite for all p > 1. It 

was proved by Jelinek that, for these channels, the expurgated 
bound of Gallager |15| is invariant over n-fold extensions of 
the channel, that is, it as the same form when computed on 
a single channel use or on multiple channel uses (this is not 
true in general). Thus, if the conjecture made in |9, pag. 77], 
that the expurgated bound computed on the n-fold channel 
is tight asymptotically when n — > oo, is true, then for these 
channels the reliability would be known exactly since it equals 
the expurgated bound for the single use channel. It is also 
known that for these channels, the inputs can be partitioned 
in subsets such that all pairs of symbols from the same subset 
are confusable and no pair of symbols from different subsets 
are confusable. The zero error capacity in this case is simply 
the (log of the) number of such subsets. For these channels, 
since the matrix C is positive semidefinite,there exists a set 
of vectors tpiT''p2, ■ ■ ■ jipK such that {tpi\ipj) = Ci,j, that is, 
for all p> 1, representations of degree p exists that satisfy all 
the constraints with equality. In this case, the equivalence with 
the cut-off rate that we have seen for p = 1 can be in a sense 
extended to other p values. We will in fact see in Section \VU\ 
that we can write 



min max loa' = 



max 
p 



ma.x-'\ogy^PkPz{ipk\Ay^''- 



(65) 
(66) 

(67) 



i.k 



Hence, under such circumstances, we find that — 
Ex{p)/p, where E^ip) is the value of the coefficient used 
in the expurgated bound of Gallager ifTsl eq. (87)] 



E{R) > E,{p) - pR, p>l. 



(68) 



Note that, for each p, this bound is a straight line which 
intercepts the axis R and E at the points E^ (p) / p and E^ (p) 
respectively, which equal i9(p) and p'd{p)- Hence, if the 
channel is pairwise reversible, then our bound is obtained by 
drawing the curve parameterized as {Ex{p) / p, Ex{p)) in the 
{R, E) plane. This automatically implies that we obtain the 
bound 



Co < lim 



E.{p) 



(69) 



p-J-oo p 

which gives the precise value of the zero-error capacity in this 



E[Q) < lim 2pt'}{p) 

p— ^OO 

= lim 2E,{p) 

p^OO 

= 2E,,{0). 



(70) 
(71) 
(72) 



If the channel is pairwise reversible, this can then be improved 
to E{R) < Eex{0), which is obviously tight. 

For general channels with a non-trivial zero-error capacity, 
like for example a channel whose confusabihty graph is a 
pentagon, what happens is that the matrix C is in general 
positive semidefinite only for not too large values of p and 
then it becomes not positive semidefinite for p large enough. 
This implies that representations that satisfy the constraints 
with equality exist only for p not too large. For larger p, the 
two expressions in equations ( |66l ) and are no more equal 
and in general they could both differ from i9(p). If all the 
values {ipk\'4'i) are nonnegative|^ however, then the expression 
in ( |66l ) equals ??(p), see Theorem |9] In this case, we see the 
interesting difference between i?(p) and i?j.(p)/p. The two 
quantities follow respectively ( |66] l and ( |67| i; when p — > oo, the 
first one tends to -d, an upper bound to Co, while the second 
one tends to the independence number of the confusabihty 
graph of the channel, a lower bound to Co (see I.39J ). 

Is is worth pointing out that, for some channels (for example 
all channels whose confusabihty graph is a pentagon), the 
optimal representation may even stay fixed for p larger than 
some given finite value Pmax- In this case, the bound is useless 

for p > Pmax- 

A final comment is about the computation of this bound. 
There is no essential difference with respect to the evaluation 
of the Lovasz theta function. The optimal representation {'(pk] 
for any fixed p, can be obtained by solving a semidefinite 
optimization problem. If we consider the [K + 1) x [K + 1) 
Gram matrix 



G = [7^i,...,^K,/]^[V^i,...>K,/] 



(73) 



we note that finding the optimal representation amounts to 
solving the problem 



s.t. 



max 
G{k,K + l) 
G{k,k) 
G{k,i) 



V 

> 



(74) 



V, \fk<K 
= 1, Vfc 

l<k<K,k<i<K 
G is positive semidefinite 

The solution to this problem gives the value for the optimal 
representation and both the representation vectors {i/'fc} and 
the handle / can be obtained by means of the spectral 
decomposition of the optimal G found. 

D. Relation to classical- quantum channels 

In deriving the umbrella bound in this section, we have 
mentioned classical-quantum channels but we have not explic- 
itly used any of their properties. The derived bound could be 

*We conjecture that the optimal representation, in terms of Lovasz's 
definition of value, always satisfies this condition. We have not yet investigated 
this aspect, but have never found a counterexample. 
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interpreted as a simple variation of Lovasz's argument toward 
bounding E{R). We decided in any case to use a notation 
that suggests an interpretation in terms of classical-quantum 
channels because, as we will see later in the paper, the bound 
derived here is a special case of a more general bound that 
can be derived by properly applying the sphere-packing bound 
for classical-quantum channels. 

In particular, while the construction of the representation 
{i!k] appears in this section as a purely mathematical trick 
to bound E{R) by means of a geometrical representation of 
the channel, it will appear evident from the results of Section 
|VIII| that in the context of classical-quantum channels this 
procedure is a natural way to bound E{R) by comparing 
the original channel with an auxiliary one. In the classical 
case, Lovasz's result came completely unexpected since it 
involves the unconventional idea of using vectors with neg- 
ative components to play the same r ole of ^/P{j\k). When 
formulated in the classical-quantum setting, however, this 
approach becomes completely transparent and does not require 
pushing imagination out of the original domain. We may 
say that classical-quantum channels are to classical channels 
as complex numbers are to real numbers. In this analogy, 
Lovasz's theta function is like Cardan's solution for cubics, 
or like a singularity in the complex domain that explains the 
radius of convergence of a power series on the real line. 

IV. Quantum Binary Hypothesis Testing 

In this section, we consider the problem of binary hypothe- 
sis testing between quantum states. In particular, we will prove 
a quantum extension of the converse part of the Shannon- 
Gallager-Berlekamp generalized Chernoff bound [9, Th. 5]. 
This is a fundamental tool in bounding the probability of error 
for codes over classical-quantum channels and it will thus play 
a central role in the Sections [V] and IX for the proof of the 
sphere-packing bound and of the zero-rate bound. 

Let Q and be two density operators in a Hilbert space %. 
We are interested in the problem of discriminating between 
the hypotheses that a given system is in state g or <r. We 
suppose here that the two density operators have non-disjoint 
supports, for otherwise the problem is trivial. The decision has 
to be taken based on the result of a measurement that can be 
identified with a pair of positive operators {1 — n,n}. The 
probability of error given that the system is in state p or are 
respectively 



where the infimum is over all measurements. Then 



Pe|e = Trne and P^K = Tr(l - H^. 



(75) 



Of particular importance in quantum statistics is the case 
where g and <^ are iV-fold tensor powers of some operators, 
that is £1 = p^^ and = cr^^ for some operators p and a. In 
this case, one is usually interested in the asymptotic behaviour 
of the probability of error as N goes to infinity. The following 
result was recently derived in ifTTll . ifTSll (see also |fT9l ) 

Theorem 2 (Quantum Che rnojf Bound): Let p, a be den- 
sity operators with Chernoff distance dc{p^ a) and let 771 and 
r]2 be positive real numbers. For any fixed N let 



lim -llogPW=dc(p,a). 

TV— foo iV 



(77) 



Note that the coefficients r/i , 772 have no effect on the asymp- 
totic exponential behavior of the error probability. With fixed 
rji, 772, the optimal projectors are such that the error probabil- 
ities Pe|p»N and Pg|o.»iv have the same exponential decay in 
N. 

In some occasions, and in particular for the purpose of the 
present paper, it is important to characterize the performance 
of optimal tests when different exponential behaviour for the 
two error probabilities are needed. The following result has 
been recently obtained as a generalization of the previous 
theorem E^j, QFl 

Theorem 3: Let p, a be density operators with non-disjoint 
supports and let and -0 = Ti{a supp{p)). Let e(r) be defined 
as 



e(r) = sup 

0<s<l 



-sr - Mp,<t(s) 



r > — log -0 



(78) 



and e(r) = 00 if r < — log?/;. Let V the set of all sequences 
of operators fl*^^) such that 

Jim log (Tr(l - n(^))a^^) > r (79) 



Then 



sup 

{n(")}ep 



lim -llogfTrnWp®^ 



< e(r) (80) 



Furhtermore, a sequence {n^^^} e V exists that satisfies ( |80l ) 
with equality. 

This generalization of the Chernoff bound, however, is not 
yet sufficient for the purpose of the present paper. In channel 
coding problems, in fact, what is usually of interest is the 
more general problem of distinguishing between two states 
that are represented by tensor product of non-identical density 
operators, that is 



g = Pl'»P2'• 



) PN and <r = cr 1 (g) (72 ( 



)aN. (81) 



(76) 



In this case, it is clear that the probability of error depends 
on the composition of the two states g and <j, that is on 
the sequences pi, P2, - ■ ■ and (Ti , 0-2 , • • • and an asymptotic 
result of the form of Theorems |2| and 131 is not to be hoped in 

— (N) 

general. For example, after the obvious redefinition of Pe 
in Theorem [2] the limit on the left hand side of ( |77] i may even 
not exist. 

For this reason, it is useful to establish a more general result 
than Theorems |2] and [3] which is stated directly in terms of the 
operators g and ^. This is precisely what is done in ||9] Th. 
5] for the classical case and we aim here at deriving at least 
the corresponding converse part of that result for the quantum 
case. 

Theorem 4 ( Quantum Shannon-Callage r-Berlekat7ip Bound): 
Let g, <r be density operators with non-disjoint supports, let 
n be a measurement operator for the binary hypothesis test 
between g and ^ and let the probabilities of error Pe\g,Pe\<; 
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be defined as in (|75j. Let = /ig.<;(s). Then, for any 

< s < 1, either 



Pe\e > ^exp 



(82) 



or 



Pel, > - exp 



(83) 

Proof: This theorem is essentially the combination of the 
main idea introduced in ifTSl for proving the converse part of 
the quantum Chernoff bound and of [9] Th. 5], the classic 
version of this same theorem. Since some intermediate steps 
of those proofs are needed, we unroll the details here for the 
reader's convenience. 

Following LI 9.1 . let the spectral decomposition of g and ^ 
be respectively 



^a,|x,)(a;,| and ^i^V^i^^y^ 



(84) 



where {jxi)} and are orthonormal bases. First observe 

that, from the Quantum Neyman-Pearson Lemma ([40J [41] ). 
it suffices to consider orthogonal projectors 11. So, we have 
n = = nin = J2j ^y]){yj\^- Symmetrically, we have 
that (1 - n) = ^^(1 - n)|a;,)(2;,|(l - H). So we have 

= ^a.|(a;.|n|y,)p 
Tr(l - n)<; 



= ^/3,|(x,|l-n|2;,)p 



Thus, for any positive rji , 772 we have 



> 



> 



^min(77ia„ 772/3,) {\{x^\n\y,}\^ + \{x,\t - n\y,}\' 



y^min(?7iQ;j,?72^j) 



2 



> ^^"^^"^ {Vla^\{x^\yJ)\'\v2^3J\{x^\yj)\'' 



(85) 



where the second last inequality is motivated by the fact 
that for any two complex numbers a, 6 we have |ap + |6p > 
\a + b\y2. 

Now, following |18|, consider the two probability distribu- 
tions defined by the Nussbaum-Szkola mapping 



Pi(z,j) = a,\{x,\y,)\\ P^it^j) = I3,\{x,\yj)\' 



(86) 



These two probability distributions are both positive for at 
least one pair of (i, j) values, since we assumed g, to have 
non-disjoint supports. Furthermore, they have the nice property 
that 



SO that 

= log^Pi(z,jy-^P2(z,j) 
Following Th. 5], define 



Y.^'.,.Pl{V,J'Y-'P2{^'.3') 



and observe that 



= i?Qjl0g(P2/Pi)] 

^Ji"{s) = VarQjlog(P2/Pi)] 



(87) 



(88) 
(89) 



where the subscript means that the expected values are 
with respect to the probability distribution Qs- Hence, if one 
defines the set 



log 1 777^ ) - M (s) 



< V2/i"(s) \ (90) 



Piihj), 

then Qsihj) > 1/2. by Chebyshev's inequality. It is 
easily checked using the definitions ([87| and (j90| that for each 
(i, j) € Yg the distribution Qg satisfies 

Qs{i,]) < Pi(*, j) (exp [^(s) - sfi'is) - s^2fj."{s)]\ 

(91) 

Qsihj) < P2(i,j)(exp[/i(s) + (1 - 



il~s)^2fi"{s)- 



(92) 



Hence, in Yg, Qsihj) is bounded by the minimum of the two 
expressions on the right hand side of ( |9T] i and ( |92] i. If we call 
iji the coefficient of Pi(i, j) in ( |9T] l and 772 the coefficient of 
P2(i,j) in (|92ji, then we obtain 



^ < ^ Qs{i,j) 

< ^ min(?7iPi(i, j),?72P2(i, j)) 

< ^min(77iPi(i,j),?72P2(i, j)) 

Now note that the last expression, by the definition of Pi 
and P2 in ( |216| l, exactly equals the sum in ( [85] l. So, with the 
selected values of r/i and 772 we have 77iPe|g + ?/2Pe|<; > 1/4- 
Hence, either P^\g > rj^^ /& or P^^^ > 77^^/8, concluding the 
proof. 



V. Sphere Packing Bound for 
Classical-Quantum Channels 

The purpose of this section is to adapt the proof of the 
sphere-packing bound in |9 Sec. IV] to the case of quantum 
channels. This results in the following theorem. The purpose of 
this section is to adapt the proof of the sphere-packing bound 
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in im Sec. IV] to the case of quantum channels. This resuhs 
in the following theorem. 

Theorem 5 (Sphere Packing Bound): For all positive rates 
R and all positive e < R, 



E{R) < Esp{R-e), 
where Esp{R) is defined by the relations 

E,p{R) = supiEoip)- pR] 



Eo{p) 



p>0 

uiaxEo{p, p) 
p 



Eo{p,p) = -logTr 



\k=l 



1+P 



(93) 

(94) 
(95) 

(96) 



Remark 2: For some channels, the function Esp{R) can be 
infinite for R small enough. The role of the arbitrarily small 
constant e is only important for one single value of the rate 
R = Roo, which is the infimum of the rates R such that 
Esp{R) is finite. 

We will follow closely the proof given in f9' Sec. IV] for 
the classic case. It is the author's belief that the proof of the 
sphere-packing bound used in |9| is not really widely known, 
expecially within the quantum information theory community, 
because, as explained in the introduction, the much simpler 
approach used in ||231 has become much more popular]^ 

Furthermore, some intermediate steps in the proof are 
clearly to be adjusted from the classical case to the quantum 
case, and this does not always come as a trivial task. Hence, 
for the reader's convenience, we prefer to go through the 
whole proof used in [9] directly speaking in terms the quantum 
channels and trying to simplify it as much as possible in view 
of the weaker results that we are pursuing with respect to 
im Th. 5] (we are here only interested in the asymptotic first 
order exponent, while in 191 bounds for fixed AI and N are 
obtained). 

Proof: The key point is using Fano's idea |8 | of bounding 
the probability of error for at least one codeword w„j by 
studying a binary hypothesis testing problem between Sw,„ 
and a dummy state f , which is only used as a measure for the 
decision operator n„i. Roughly speaking, we will show that 
there exists one m and a state f such that 

- the probability under state f of the outcome associated to 
the decision for message m, call it P{m\i) = Tr(n,„f), 
is small; 

- state f is only distinguishable from to a certain 
degree in a binary detection test. 

Using Theorem|4j this will imply that the probability P{m\m) 
cannot be too high. The whole proof is devoted to the 
construction of such a state f , which has to be chosen properly 
depending on the code. We are now ready to start the detailed 
proof. 

We first simplify the problem using a very well known 
observation, that is, the fact that for the study of E{R) we 

'Viterbi and Omura |27| define "an intellectual tour-de-force", even if 
characterized by "flavor, style, elegance", the proof of the sphere-packing 
bound of |9| and Gallager himself defines it as "quite complicated" |42| 
and "tedious and subtle to derive" |26|. See Appendix [b] for some historical 
comments on the proof of the theorem in the classical case. 



can only consider the case of constant composition codes. It 
is by now very well known that every code with rate R and 
block length N contains a constant composition subcode of 
rate R' ~ R — o(l), where o(l) goes to zero when N goes 
to infinity (see f?3l, f27l, f24\). This is due to the fact that 
the different compositions of codewords of lenth N is only 
polynomial in N while the code size is exponential. Hence, we 
will focus on this constant composition subcode and consider it 
as our initial code. Let thus our code have M codewords with 
the same composition. Let Ck be the number of occurrences 
of symbol k in each word and define then qk as the ratio 
Ck/N, so that the vector p = {pi,p2, ■ ■ ■ ,Pk) is obviously a 
probability distribution over the K input symbols. 

Let now f be a state in H^-^ . We will first apply Theorem 
|4] using one of the codewords as state g and f as state <r. This 
will result in a trade-off between the rate of the code R and 
the probability of error Pe.max, where both quantities will be 
parameterized in the parameter s, a higher rate being allowed 
if a larger Pe,max is tolerated and vice-versa. This trade-off 
depends of course on p and f . We will later pick f properly 
so as to obtain the best possible bound for a given R valid for 
all compositions p. 

For any m = 1 . . . , M, consider the binary hypothesis 
testing between Sw,„ and f. We assume that their supports 
are not disjoint (we will later show that such a choice of f is 
possible) and define the quantity 



= logTrSir;P. 



(97) 
(98) 



Applying Theorem |4] with g = Sw„, = f and 11 = 1 — n„i, 
we find that for each s in < .s < 1, either 

Tr [(1 - n„0 Sw„] > I exp L(s) - sp'{s) - s^2p"{s) 

' ^ (99) 



Tr [n,„f] > ^ exp [^(s) + (1 - s)p'{s) - (1 - s)^2p."{s) 

(100) 

Note now that Tr [(1 - n„0 Sw„J Pe.m < -Pe,max for all 
Furthermore, since J2m=i — ^' ^'^^ l^^^*- value 



of TO we have Tr [H^f] < l/M 



-NR 



Choosing this 



particular m, we thus obtain from the above two equations 
that either 

1 



-Pe,max > ^ exp ^(s) - S/i'(s) - S^/2^l"{s) (101) 



or 



+ (1 - s)p'{s) - (1 - ,s) v/V^ - logs 

(102) 

In these equations we begin to see the aimed trade-off 
between the rate and the probability of error. It is implicit 
here in the definition of /i(s) that both equations depend on 
Sw,„ and f . Since m has been fixed, we can drop its explicit 
indication and use simply w in place of Wm from this point 
on. We will now call R{s, Sw, f ) the right hand side of ( |102| i. 
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that is 

i?(s,Sw,f) = -^(MGs) + (l-s)M'(s) 

-(l-s)v/2/."(s)-log8) (103) 

This expression allows us to write /i'(s) in ( |101| i in terms of 
i?(s, Sw,f) so that, taking the logarithm in equation ( |101| i, 
our conditions can be rewritten as either 

i?< i?(s,Sw,f) (104) 

or 

log ^5 <-|^--— i?(s,Sw,f) 

+ 2s^/2i^) + ^^. (105) 
1 — s 

At this point, we have to exploit the fact that we are 
considering a fixed composition code. Since we want our result 
to depend only on the composition p and not on the particular 
sequence w, we choose f so that the function /i(s) also only 
depends on the composition q. We thus choose f to be the 
iV-fold power of a state / in H, that is f = f^^. We this 
choice, in fact, we easily check that, if w = (fci, k2, ■ ■ ■ , fcAr), 

^i{.3) = logTrSir^f^ (106) 

N 

= \ogl[Tr Si-' r (107) 

K 

= logHiTr Si-' (108) 
fc=i 

K 

= iV^Pfclog(Tr5^7') (109) 
fc=i 

K 

= NY,Pk^^s,J{s) (110) 

k=l 

Thus, /i(s) actually only depends on the composition p and 
on /, and not on the particular w. It is useful to remember 
that since we assumed the supports of f and Sw to be non- 
disjoint, the supports of Sk and / are not disjoint if qk > 0, 
so that all terms in the sum are well defined. We simplify our 
notation writing /ij. j(s) for ij,s^ j{s). Note that we also have 

/x'(s) = NJ2pk^^kJ{s) (111) 

= NY,Pkf^lf{s)- (112) 

With the same procedure used to obtain ( |89l ) using the 
Nussbaum-Szkola mapping ( |216| l, we see that for fixed / and 
s, Mj!y(s) is a variance of a finite random variable and it is 
thus non-negative and bounded by a constant for all k. Taking 
the largest of these constants, say C/(s), we find that 

ti"{s)<NCfis). (113) 

The essential point here is that the contribution of ^(s) and 
fj.'{s) in our bounds will grow linearly in N, while the 
contribution of ij,"{s) will only grow with y/N. Hence, the 
terms involving /i"(s) in our equations will not have any effect 



on the first order exponent of our bounds. A formalization 
of this fact, however, is tricky. In |9| the effect of fi"{s) 
in the classic case is dealt with by exploiting the fact that 
IJ-'l f{s) is a variance and proving that, uniformly over s and 

/' - log(e/V-Pmin). where Pmin is the smallest 

non-zero transition probability of the channel. This allows to 
proceed to obtain a bound valid for finite N. In our case, 
this procedure appears to be more complicated. If fJ,kj{s) is 
studied in the quantum domain of operators Sk and /, then 
/i'^'(s, /) is not a variance, and thus a different approach must 
be studied; if ^kj{s) is studied by means of the Nussbaum- 
Szkola mapping, then in ( (89] l both Pi and P2 vary when only 
/ varies, and thus there is no such Pmin to be used. For this 
reason, we need to take a different approach and we content 
ourselves with finding a bound on E{R) using the asymptotic 
regime N 00. 

Simplifying again the notation in light of the previous 
observations, let us write R{s,p,f) for i?(s, Sw,f). Using 
the obtained expression for /i(s), our conditions are either 

R<Ris,pJ) (114) 

or 

^ log p < E 9fcAife,/(s) - 7— ^(*' P' /) 

' k 

+ 1(2.72^+1^). (115) 

Now we come to the most critical step. Given a rate R, 
we want to bound Pe.max for all codes. We can fix first the 
composition of the code, bound the probability of error, and 
then find the best possible composition. Since we can choose 
s and /, for a given R and p, we will choose them so that 
the first inequality is not satisfied, which will imply that the 
second one is, thus bounding Pe^max- 

The point here is that we are free to chose s and /, but we 
then need to optimize the composition q in order to have a 
bound valid for all codes. This direct approach, even in the 
classic case, turns out to be very complicated (see [18] Sec. 9.3 
and 9.4, pag. 188-303] for a detailed and however instructive 
analysis). The authors in ||9l thus proceed in a more synthetic 
way by stating the resulting optimal / and q as a function 
of s and then proving that this choice leads to the desired 
bound. Here, we will follow this approach showing that the 
same reasoning can be applied also to the case of quantum 
channels. 

It is important to point out that it is not possible to simply 
convert the quantum problem to the classical one using the 
Nussbaum-Szkola mapping pi6| l directly on the states Sk and 
/ and then using the construction of |9, eqs. (4. 18)-(4.20)] on 
the obtained classical distributions. In fact, in ( |216| l, even if 
one of the two states is kept fixed and only the other one 
varies, both distributions vary. Thus, even if / is kept fixed, 
the effect of varying Sk for the different values of k would 
not be compatible with the fact that in |9, eq. (4.20)] a fixed 
is has to be used which cannot depend on k. Fortunately, it 
is instead possible to exactly replicate the steps used in @ 
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by correctly reinterpreting the construction of / and q in the 
quantum setting. 

For any s in the interval < s < 1, define 



or 



K 



(116) 



fe=i 



Let then Ps = {pi.s, ■ ■ ■ ,Pk.s) be the distribution that mini- 
mizes the expression 



Tr a{s, p) 



l/{l-s) 



which surely admits a minimum in the simplex of probability 
distributions. As observed by Holevc|^ [JA, eq. (38)], the dis- 
tribution Ps that achieves the minimum satisfies the conditions 



Tr 
where 



fc 1,, 



as = a(s,ps) 



K 



l-s 
k ■ 



.,K 
(118) 

(119) 
(120) 



k=l 



Furthermore, equation ( |118| l is satisfied with equality for those 
k with pk.s > 0, as can be verified by multiplying it by pk,s 
and summing over k. Then, we define 



1/(1-'*) 



Tr a 



1/(1-5)- 



(121) 



Since we can choose s and / freely, we will now tie the 
operator / to the choice of s, using fg for /. We only have to 
keep in mind that /i'(s) and are computed by holding 

/ fixed. The vector will instead be used later. Note further 
that we fullfill the requirement that / and Sk have non-disjoint 
supports, since the left hand side in ( |118| l must be positive for 
all k. 

As in [P, eqs (4.21)-(4.22)], we see that, using fs in place 
of / in the definition of fik.f{s), we get 



i-«„^/(i-^) 



^lkJAs) = \og^T(^sl 

Using ( |118| l we then see that 

fikjAs) > (l-.s)logTray(i-^) 
= -{l-s)Eo(^,p. 



slogTray^^"") (122) 



l-s 
s 



(123) 
(124) 

(125) 



with equality if pk^s > 0. Here, we have used equation ( |120| i, 
the definitions ( |96| ) and ( (95] l, and the the fact that p^ minimizes 
( 1 17 1. Thus, with the choice of f — fs, equations ( |114| i and 
( 1 15 1 can be rewritten as (for each s) either 



R<Ris,p,fs) 



(126) 



The variable s in |14| corresponds to our s/(l — s), that we call p here 
in accordance with the consolidated classical notation. 



1 , 1 / S 

log < Eo 



2sV2 



-R{s,pJs 



N 



where 



;^(l--^)./2E^'</=(^) + ^log8. (128) 



Now, for a fixed R, we are free to choose s and then use the 
two conditions. Using the a procedure similar to that used in 
[9, pag. 100-102], it is proved in Appendix [a| that R{s, p, fs) 
is a continuous function of s in the interval < s < 1. Thus, 
for fixed R, we can only have three possibilities: 

1) R = R{s, p, fs) for some s in (0, 1); 

2) R>R{s,pJs) Vse (0,1); 

3) R<Ris,pJs) Vse (0,1). 

Dealing with these possibilities for a fixed code is more 
complicated in our case than in |9| due to the fact that we 
have not been able to bound uniformly the second derivatives 
n'l I (s) for s £ (0, 1). Thus, we have to depart slightly from 
|9|. Instead of considering a fixed code of block length N, 
consider sequences of codes of increasing block-length. From 
the definition of E{R) in ( fTTj ), it is obvious that there exists a 
sequence of codes of block-lengths Ni, N2, . . . , Nn, ■ ■ ■ , and 
rates Ri, R2, . . . , Rn, ■ ■ ■ such that R — lini„ i?„ and 



EiR) = lim 



logPi^")(i?). 



(129) 



Each code of the sequence will in general have a different 
compositiorj^ p„ but must anyway fall in one of the above 
three cases. Thus, one of those cases is verified infinitely often. 
Since the compositions p„ are in a bounded set, there exists a 
subsequence of codes such that p„ converge to, say, p. Thus, 
we can directly assume this subsequence is our own sequence 
and safely assume that p„ — > p. 

Suppose now that case (1) is verified infinitely often. Thus, 
for infinitely many n, there is an s = s„ in the interval < 
s < 1 such that _R„ = R{s, Pn, fs^). Hence, since the values 
s„ are in the interval (0, 1), there must exists an accumulating 
point for the s„ in the closed interval [0, 1]. We will first 
assume that such an accumulating point s exists satisfying 
< s < 1. A subsequence of codes then exists with the s„ 
tending to s. Let this subsequence be our new sequence. We 
can first substitute i?(s„, p„, /s,J with i?„ in ( |127[ ). Letting 
then n 00, we find that i?„ — > R and the last two terms on 
the right hand side of ( |127[ ) vanish, since ji'^ ^ (s) is bounded 

'with some abuse of notation, we now use pn where n is an index for the 
sequence and obviously does not have anything to do with the s of ps 
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for s sufficiently close to s ^ 0, 1. Hence, we obtain 



EiR) < Eo 



< sup{Eo{p)- pR) 

p>0 

= Esp{R)- 



:R 



(130) 
(131) 
(132) 



If the only accumulating point for the s„ is s = 1 or s = 0, 
the above procedure cannot be applied since we cannot get rid 
of the last two terms in ( |127| i by letting n -> oo, because we 
have not bounded /x'^' ^ (s) uniformly over s e (0, 1), and it 
may well be that Pkj^ (s) is unbounded near s = or s = 1. 
These cases, however, can be handled with the same procedure 
used for cases (2) and (3) to be discussed below. 

Suppose thus that case (2) above is verified infinitely often, 
or that case (1) is verified infinitely often with the only 
accumulating point s = for the values s„. Then, since 
Rn — ^ R and p„ p, we easily deduce that for all s G (0, 1) 
we have R{s,p,fs) < R, that is, R{s,p,fs) is also bounded 
over s. Given any ei > 0, for any fixed s e [ei,l) we 
must have R{s, Pn, fs) < Rn infinitely often, so that we can 
focus on the subsequence of codes with this property. Since 
condition ( |126| l is not satisfied for these codes, ( |127[ l must be 
satisfied. Since s E [ei, 1) is fixed, we can make n — > oo so 
that the last two terms on the right hand side of ( |127| i vanish 
again and we get that for all s E [^i, 1) 



EiR) < Eo 



1 - s 



1 - s 



R{s,pJ) 



(133) 



Letting then ei — > 0, we can let s — > as well. Using the 
fact that i?(s, p, /) is bounded for s G (0, 1), and using the 
known properties of Eo{-), we find that E{R) < 0. Thus, 
surely E{R) < Esp{R) proving the theorem in this case. 

Finally, suppose now that either case (3) above is verified for 
infinitely many n or that case (1) is verified infinitely often 
with the only accumulating point s — 1 for the values s„. 
Given any ei > 0, for all s £ (0,1 — ei], the inequality 
Rn < R{s,Pn, fs) is verified infinitely often. Let us focus 
again on this subsequence as if it was our sequence. For any 
fixed s G (0, 1 — £i], taking the limit n cx) in ( |126| l with _R„ 
in place of R, remembering that we are working on sequences 
of codes such that p„ p and _R„ R, we obtain 



R 



< - 



^PkPkjAs) - (1 - s)^PfcAifej,(s)(134) 



= J^Pf' ("Mfe,/. (s) - (1 - (s)) 



(135) 



Now, by the fact that Hk.f{s) is convex and non-positive for 
all /, it is possible to observe that Hkj^ (s) — s/ij. j (s) < 0, 
which implies that — /z'^ ^ (s) < p-kjA^)/^- Thus, for all s G 

(0,1-ei], 



R < 



< 



k 

1 - s 



Pk PkjSs) 



-En 



1-s 



(136) 



(137) 



where in the last step we have used the fact that, as seen 
from ( |125| l, the choice f = fs implies Pk,fi,{s) > — (1 — 



s)Eo{s/{l — s)). Calling now p = s/{l — s), we find that for 

all p< (l-ei)/ei 

R<^ 
P 

Hence, for any 62 > 0, we find 



Esp (i? — £2) 



: sup {Eo (p) - p {R - 62)) 

> sup (Eoip)- p{R~e2)) 

0<p<(l-ei)/ei 

> sup pe2 

0<p<(l-ei)/ei 
_ (1 - £l)£2 
£1 

It then suffices to let £1 — > to see that Esp{R — £2) is 
unbounded for any £2 > which obviously implies that 
E{R) < Esp{R — £2) for all positive £2, thus concluding 
the proof of the theorem. ■ 

In the high rate region, the obtained expression for the 
upper bound to the reliability function coincides with the 
random coding expression which is respectively proved and 
conjectured to represent a lower bound to the reliability of 
pure-state and mixed-state channels. Thus, the sphere-packing 
bound is proved to be tight in the high rate region for pure- 
state channels and we may as well conjecture that it is tight 
in the general case. 

Again as a recurring theme, however, the sphere-packing 
bound is provably not tight in the low rate region. As opposed 
to the classical case, however, in the classical-quantum case 
the sphere-packing bound has some important properties that 
allow to derive interesting bounds by means of auxiliary 
channels. In particular, for channels with a zero-error capacity, 
it allows to find finite bounds to E{R) down to rates that 
are at least as small as the Lovaaz theta function, which 
emerges naturally as a side result of a more general framework. 
The next two sections are devoted to such bounds, which are 
particularly interesting for channels with a zero-error capacity. 
For channels with no zero-error capacity tighter bounds can 
usually be derived by different techniques. These channels are 
considered in Section HXl 

VI. Information Radii, Zero-Error Capacity and 
THE Lovasz Theta Function 

It is known that the channel capacity in the classical case 
can be expressed as the "information radius" of the conditional 
probabilities of the channel, the metric being the Kullback- 
Leibler divergence in that case. In the classical case, as made 
clear by Csiszar [25 J , a similar expression holds for what has 
sometimes been called generalized capacity or generalized 
cut-off rat^^ (see fl5\ for a detailed discussion). Here we 
give an analogous formulation for the classical quantum case. 
It is important to remark that this property of the function 
Esp{R) was already observed in |9, eq. (4.23)], although 

'"since we also consider the zero-error capacity Co of channels in this 
paper, we prefer to avoid any reference to generalized capacities in the sense 
of |25|. Instead, we prefer to adopt the notation of Savage [36 eq. (15)]. 
In light of Arikan's results l:44i . this may also be more appropriate than the 
notation in |45 eq. (5)].) 
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Stated in slightly different terms, and that it is essentially this 
property that is used in the proof of the sphere-packing bound 
for the determination of the optimal auxiliary state /. The 
purpose of this section is to give an evident presentation of the 
remarkable similarity between Lovasz's approach to bounding 
the zero-error capacity and the sphere-packing bound method 
for bounding E{R). As will be clear from this analysis, the 
sphere-packing bound, once extended to the classical-quantum 
context, allows to recover Lovasz's result into a more general 
probabilistic setting. 

The sphere-packing bound Esp{R) is the upper envelop of 
all the lines Eq{p) — pR; an important quantity is the value 
Rp = Eq{p)/p at which each of this line meets the R axis, 
that is 



i+p 



Rp = max - - log Tr [ ^ ]PkS^k 



i/(i+p) 



(138) 



The main result of this section is the following theorem. 
Theorem 6: The rate Rp defined above satisfies 



R„ = min max - log ; — ^ 



1 



Proof: Setting s = /^/(l + p), we can write 



Rp — max - 
p 



log 



l/(l-s)- 



Tr 



and, using a(s, p) 
Ro = 



1 



we can write 



max - 
p 



log||a(s,p)||i/( 



l-s) 



(139) 



(140) 



(141) 



where || • ||r is the Schatten r-norm. From the Holder inequality 
we know that for any positive operators A and B we have 



l^ll 



\B\\y,>Tv{AB) 



(142) 



with equality if an only if i? = ^ A'^ / ^'^^ '^'^ for some scalar 
coefficient 7. Thus we can write 

= max Tr{AB) (143) 

i/s<l 

where B runs over positive operators in the unit ball in the 
(l/s)-norm. Using this expression for the Schatten norm we 
obtain 

1 



Ro 



max - 
p 



log max Tr(a(s,p)_B) 



1 / ^ 

- - log min max Tr > 

s P l|s|U,.<i 



PkSl-'B 



(144) 



(145) 



In the last expression, the minimum and the maximum are 
both taken over convex sets and the objective function is 
linear both in p and B. Thus, we can interchange the order 
of maximization and minimization to get 



Rn 



K 



\\B\ 



<1 



log jnax min^pfc Tr {Sl^'B) (146) 
inTr(S'^""B) (147) 



- - log max mm 

S Il-B||i/^<1 k 



Now, we note that the maximum over B is always achieved on 
the frontier since all states Sk are positive operators. Changing 
the dummy variable B with / = B^/* we get 

Rp = -- log max min Tr (Slr^f) (148) 

S f k ^ ' 



1, 1 

= mm max - log ; — 

/ k s ^Tr(S'^-7'' 



(149) 



where / now runs over all density operators. ■ 
It is obvious that, if all operators Sk commute, than the 
optimal / is diagonal in the same basis where the Sk are, 
and we thus recover Csiszar's expressiorp] [25, eq. (18)]. 
Furthermore, for p (that is, s — > 0) we obtain the 
expression of the capacity as an information radius already 
established for classical-quantum channels [461. When p = I 
(that is, s = 1/2) then, we obtain an alternative expression for 
the so called quantum cut-off rate (7T\. 

The most important case in our context, however, is the case 
when p — > cx), that is, s ^ 1, for which we can coherently 
define 

Roo = lim Rp. (150) 

Let S^ the projector into the subspace of Sk- We have first 
point out that, by letting s — > 1 in equation ( |141| i, we can 
write 



Ro 



max - 
p 



logAniax i^PkSk 



= — log min A„ 



\fe=i 
r K 

E 



PkSl 



(151) 



(152) 



We will not use this expression for now. We only point out 
that we will use this expression in the next section for the 
particular case of pure-state channels. Furthermore, the above 
expression shows that finding R^o for a given channel is a so 
called eigenvalue problem , a well known special case of linear 
matrix inequality problems, which can be efficiently solved by 
numerical methods |48|. If the states Sk commute, the channel 
is classical and this problem is already known to reduce to a 
linear programming one. 

Here, however, we proceed by using expressions that make 
more evident the relation with the Lovasz's theta function. So, 
taking the limit p ^ 00, or: s ^ \ Theorem |6] we obtain 

min max log ^ / cio i\ (153) 



Ro 



f 



Tr(5,"/) 



where the minimum is again over all density operators /. It is 
obvious that the zero-error capacity Cq of the channel is upper 
bounded by any value of the rate R for which Esp{R) is finite. 
It is easy to notice that Esp{R) is finite for all R > R^, since 
for these rates we can write 



Esp{R) < max [Eoip) ~ pR] 



0<p<p 



(154) 



where p' is the largest p satisfying Rp' = R. Furthermore, for 
R < Roo, the function Esp{R) is infinite. Hence, the tightest 

"Note that Csiszar's a parameter equals 1 — s = l/(l+p) in our notation, 
see (25l eq. (16)] 
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upper bound to Cq that can be derived from the sphere-packing 
bound is i?oo- We summarize this in the following theorem. 

Theorem 7: For a classical-quantum channel with states Sk, 
Cq < Roo, where 

i?oo = mm max log ^^^^^ (155) 

/ K \ 



max ^ 
p 



log A 



max I / ^ 



^PkSt 



(156) 



Now, as in the classical case, it is obvious that differ- 
ent classical-quantum channels share the same confusability 
graph and thus have the same zero-error capacity Cq. Hence, 
the best upper bound that we can obtain for Co is not in 
general obtained with the original channel but with some 
auxiliary channels which, following Lovasz, could be seen 
as representations of the channel confusability graph. Here, a 
representation of the graph is a set of projectors {Uk} such that 
UiUk = if symbols i and k cannot be confused. Furthermore, 
we use an alternative definition of value 

Vsp{{Uk}) = mmmaxlog ^^^^^^ (157) 

where the minimum is over all density operators /. The 
optimal / will be called again the handle of the representation. 
We can finally define the quantit}{^ 

1 



t9s„ min min max log 



where {Uk} runs over the sets of projectors such that UiUk — 
if symbols i and k cannot be confused. We then have the 
following result. 

Theorem 8: For any graph, we have 



Co <^sp<^ 



(159) 



Proof: The fact that dsp < i9 is obvious, since -d si ob- 
tained by restricting the minimization in the definition of 1?^^ 
to rank-one projectors Uk and handle /. That Cg < 'dsp should 
be clear in light of the above discussion. It is instructive, 
however, to present an alternative, self-contained proof that 
does not involve the function Esp{R)- This can be done using 
the same argument used by Lovasz but in the more general 
situation where general projector operators are used for the 
representation in place of the one-dimensional vectors, and a 
general density operator is used as the handle. 

Consider an optimal representation {Uk} and, to a sequence 
of symbols w = (fci, fc2, . . . , fc^r), associate the operator 
(projector) Uw — Uk^® Uk2 • • • (X) C/fc„ . Consider then a set of 
M non-confusable codewords of length N, wi, . . . , wm, and 
their associated projectors , • ■ • , ■ Then, for m ^ m' 
we have 



N 



Tr(Uw„Uw,„,) =l[TTiUk„^^^Uk^, J 



0, 



(160) 
(161) 



'^The defined quantity does not have an evident connection witli the results 
of [49 1 . There, the authors extend the algebraic definition of the Lovasz theta 
function to consider what they call non-commutative graphs. Our definition 
is instead intended for the classical notion of confusability graphs. 



where we have used the property that Tr{{A® B)[C ® D)) = 
Tr(ylC) Tr(i3D) and the fact that, for at least one value of i, 
Ukrr, iUk , — 0, since the two codewords are not confusable. 
Hence, since the states {Uw„, } are orthogonal projectors, we 
clearly have 



M 



< 1 



(162) 



m— 1 



where 1 is the identity operator. Consider now the state F = 
f^^ where / is the handle of the representation {Uk}- Note 
that, for each m, we have 



N 



TT{lJ^,^^F)^l[TT{Uk^J) 



> e 



i=l 



Hence, we have 



> 



M 

E 

m— 1 

> Me 



Tv{F) 

Tr(Uw„.i^) 



(163) 
(164) 
(165) 



Thus, we deduce that M < e^"^'^. This implies that for any 
N, the rate of a zero-error code of length N, and thus the 
capacity of the graph Cq, is not larger than -dsp- ■ 
One may wonder whether there exists a graph for which 
(158) ^ ^ hsivt not yet found an answer to this question. 



which is the objective of an ongoing study. It may be worth 
pointing out that, even if we restrict the search to optimal rank- 
one representations as in the Lovasz's case, it is not obvious 
what happens by allowing / to have rank larger than one. For 
example, experimental evidence shows that, for a rank-one 
representation with states Uk = \uk){uk\, the optimal handle 
has in some cases rank larger than one if {ui\uk) < for 
some i, k (see also Theorem |9] below). What we have not 
yet determined, however, is whether this can happen for an 
optimal representation achieving i)sp- 

VII. Classical Channels and Pure-State Channels 

It is useful to separately consider some properties of the 
sphere-packing bound when computed for classical and for 
classical-quantum channels. Classical channels can always 
be described as classical-quantum channels with pairwise 
commuting states Sk and the sphere-packing bound for these 
channels is precisely the same as the usual one |9|. So, there is 
no need to discuss this type of channels in general and the aim 
of this section is instead to show that there is an interesting 
relation between a classical channels and a properly chosen 
pure-state channel. In order to make this relation clear, we 
first study the particular form of the sphere-packing bound for 
pure-state channels. Note that, for these channels, the bound 
is known to be tight at high rates 1131 . 

For a channel with pure states Sk = |V-'fc)(V'fe|^ we simply 
note that we have S^^'"^^''^ = Sk and, hence, the function 
Eo{p,p) can be written in a simplified way. Let 



Sp = ^PkHk){tpk\ 
fc=i 



(166) 



20 



be the mixed state generated by the distribution p over the 
input pure-states. Then we can write 



i+p 



logTrS'^+'' 
log^A.(^p)i+'' 



(167) 

(168) 
(169) 



where the Aj(S'p)'s are the eigenvalues of Sp. For these 
channels, the expressions for Rao already introduced in the 
previous section reduce to 



and 



Roo = -logminAmax(<S'p) 



i?oo = mill max log , , 
f k Tr [Skf) 



= min max log , , . „. , . 



(170) 

(171) 
(172) 



where the minimum in the former expression is over all 
density operators /. We note that if the state vectors ipk 
are constructed from the transition probabilities of a classical 
channel according to equation ( |28| ), then the above expression 
is very similar to what we called in the derivation of 
the umbrella bound of Section |III-C| The only difference is 
that / was a vector there, while it is a density operator here. 
What happens is that if for the optimizing p* in ( |170| i the 
eigenvalue XnmxiSp') has multiplicity one, then the optimal 
density operator / always has rank one, and thus actually 
— Roo- If the eigenvalue has multiplicity larger than one, 
instead, then the optimal / may have rank larger than one and 
in that case, R^o can be strictly smaller than t?(l). We next 
show that for the choice of vectors ijjk indicated in equation 
([28)1 the two quantities are actually equal. Since we already 



observed in Section 111-C that ■&{!) is the cut-off rate of the 
classical channel, we deduce the interesting identity between 
the cut-off rate of a classical channel and the rate i?oo of a 
pure state classical-quantum channel defined by means of ( |28] l. 

In order to prove this result, we may prove that, for this 
particular choice of vectors tpk, the minimizing / in the 
expression of i?oo can be chosen to be a rank one operator 
This would immediately show that Roo ~ and thus that 
this is also the cut-off rate by means of Csiszar's argument. 
We prefer to give here a different proof which is based on a 
more general result, which will also be useful in proving a 
statement that we made in section IllI-CI about the so called 
non-negative channels. 

Theorem 9: If the vectors t/jk of a pure-state channel satisfy 
(i/'ilV'fc) ^ 0' k, then we have 



Ro 



max - 
p 



log^P<Pfe(V'i|V'fc) (173) 



Furthermore, R^o equals the value V{{ipk}) in Lovasz's sense, 
that is, the optimal / in equation ( |172| i can be chosen to have 
rank one. 



Proof: We start with the expression 

Roo = -logminAmax('Sp) 
p 



log min A 
p 



max 

\fc=i 



(174) 
(175) 



Define a diagonal matrix Dp with the vector p on is diagonal 
and a matrix ^ with the vectors tpk in its columns. Observe 
that 

Aniax I y^Pfe|V'fe)('0fc| 



*max 1 / ^ J 
\k=l 



Amax(*Cp**) 



A,nax(*V-DpV^P **) 

Amax((*/D;)(*/D;)*) 

^n^^pj 



A„,ax((*V^p)*(*V^p)) 



SO that 

Roc 



log min A„ 
p 

■log min m.&yi {v\['^ ^/Dp)\<b J~Dp)\v) 

P \\v\\ = l 



where the maximum is now over all unit norm vectors v in ] 



Since the matrix An = Dp)' Dp) has components 



^p 

^p(*7 j) = \/Piy/Pj{'4'i\''Pj) ^ 0' it is not difficult to see that 
the maximum can always be attained be a vector v with non- 
negative components. This implies that we can write 



R. 



■oo — ^ log min max 
p q 



' log min max 
p q 



51 V*%/^^p(*'-^') 

E 



(176) 



(177) 



The sum in the last expression is a concave function of both 
p and q. In fact, considering for example the behaviour in p, 
each term of the sum is a concave function of p since we have 



\/apu + (1 - ct)p2 



apij + (1 - a)p2j 
> a^/^5li^/Pl7 + (1 - a)y/p2i 



/P2j 



by the Cauchy-Schwartz inequality. Thus, it is not possible 
here to exchange the maximum and the minimum, as we 
did in the previous expressions for Roo, without further 
considerations. We proceed in a less conventional way directly 
proving the optimality of the pair of distributions p and q both 
equal to 

p* = argmin'VpjPj(V'ilV'j) (178) 



arg min 'j 
p 



First note that the sum in the last expression is a convex 
function of p, since it is a quadratic form with nonnegative 
definite kernel matrix. This implies that p* can be determined 
by applying the usual Kuhn-Tucker conditions, which after 
some calculations (see also |37|) lead to 



3 



hi 



Vfc 



(179) 



with equality if p*^ > 0. 

Now, in order to prove that q = p = p* solves the 
min-max problem of eq. ( |177| i, consider the conditions for 
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optimality of q given a fixed p. Since, as already observed, 
the function to maximize is concave, we can apply the usual 
Kuhn-Tucker conditions, which leads after some calculation 
to the conclusion that q is optimal for a given p if and only 
if 



X! ^=\/^\/Pfe\/p7(V'felV'. 



<^V^iV^VPiVpJ{'4'i\tPj), Vfc (180) 

with equality if qk > 0. Note that the inequality is surely strict 
if pk = 0, which implies that the condition is always satisfied 
for all k such that Pk = and also that, for such k, the optiml 
q has qk = 0. Also, qk = is optimal only if pk = 0, for 
otherwise the associated condition is not met. Hence, qk — 
if and only if pk = 0. Now we check if q = p is optimal for 
the maximization. Substituting p for q we get the following 
conditions 



^Pj{lpk\tpj) = ^P^PJ{'^p^\^pJ) 



ifpk>0 (181) 



< 



1,3 

E 

1,3 



PiPj{lpi\lpj 



if Pfc = 



(182) 



Comparing with eq. ( |179| l, we thus see that q = p is optimal 
if p = p*, since all the conditions of eq. ( |181| l are satisfied 
with equality for all k such that pk > (and thus qk > 0), 
while the conditions in ( |182| i are always trivially satisfied. 

Thus, q = p is optimal for p = p*. This automatically 
implies that p* is optimal since we have 



max 
q 



i,3 



> 



E \/^\/* VPiVP3 i^i I ^3 ) 

i,3 

= ^PtPj{i^i\-lp3) 
i,3 

>Y.p*p*{i^^\^j). 

i,3 

Thus, for the choice p = p* we have the optimal q = p* and 
thus 



Rao = - ^ogm\\\^p,pj{il)i\^j] 

i,3 

= max - log ^ pip-j (V'i I V'j 



(183) 
(184) 



as was to be proven. 

We only need now to prove that R^o = ^({V'fc})' which 
means that the optimal / in equation ( |172| l can be chosen to 
have rank one, or, in other words, that there exists a unit norm 
vector / satisfying 



l(^fel/)P>e- 



(185) 



In order to do this, consider the conditions in equation ( |179| l, 
which are satisfied for the optimal p achieving i?oo. Note that 



the right hand side of ( |179| l is precisely the value e ^- 
it can be written as || X^j-Pi V'ilP- Hence, if we define 

the conditions of equation ( |179| l can be written as 



and 



(186) 



(187) 



> 



Y.^P*i^i 



12 ^ 



This implies that / satisfies | (V"*; I/) P 
as desired. ■ 

Remark 3: The condition {il)i\ipk) > 0, Vi,A;, is by no 
means necessary. If a vector ipk is substituted with —ipk, the 
density matrix Sp will not change, while some scalar products 
{tpi\ipj) will become negative. However, note that all the signs 
of the scalar products (V^.; with j — k or i — k will change. 

Corollary 2: The cut-off rate of a classical channel with 
transition probabilities P{j\k), k — 1, . . . ,K, j — 1, . . . , J 
equals the rate Roo of any classical-quantum pure-state chan- 
nel with states Sk — \i^k){'^Jk\ such that 



(188) 



In particular we have the following expression for the cut-off 
rate 



K 



log mill A,] 
p 



^Pk\iJk){iJk\ 



(189) 



\k=l 



Remark 4: Note that, the classical channel can be obtained 
from such a pure-state channel if a separable orthogonal 
measurement is used. Hence, any such pure-state channel 
can be interpreted as an underlying channel upon which the 
classical one is built. It is worth pointing out to compare this 
result with the known properties of the cut-off rate in the 
contexts of sequential decoding |f36l , B4| and list decoding 
ll50l . We have not yet studied this analogy, but we believe it 
deserves further consideration. 

We close this section with a bound on the possible values 
taken by Esp{R) for pure-state channels, that will also be 
useful in the next section. Since Esp{R) is finite and non- 
increasing for all R > Roo, it would be interesting to evaluate 
Esp{R'^), since this represents the largest finite value of the 
function Egp{R). Unfortunately, it is not easy in general to 
find this precise value. For the purpose of this paper, however, 
the following upper bound will be useful. 

Theorem 10: For any pure-state channel we have 

EspiRoo) < Roc- (190) 
Proof: Let p* be the optimal p in in ( |170| l. Then we have 

E,p{R^) ^ snp Eoip) - pRoo (191) 

p>0 



sup - 

p>0 



log min V A,^+'' {Sp) + p log A„ 



(192) 



For each p and each p, we have 



^A|^''(S'p) > Aj\+^(S'p) 

i 

— '^max('^P* )i 
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since p* minimizes Amax (*S'p)- Hence, 

Esp (i?oo ) < sup - log Xl+^{Sp> ) + p log Aniax (5p- ) 

p>0 

= - log Amax (5p*) 
— -^oo ■ 

■ 

Remark 5: The bound is tight, in the sense that Esp{Roo) = 
Rao is possible. For example, if the optimal p in ( |170| i is 
such that XmaxiSp), and if the value -Esp(^oo) is attainecj^ 
as p — >^ oo, then by inspection of the proof we notice that 



Hence, asymptotically in the block length N, we find that 



Esp{Ra 



Ro 



VIII. Sphere-Packed Umbrella Bound 

In this section we consider again the umbrella bound of 
section [III] and we extend it to a more general bound by means 
of the sphere-packing bound. While the original idea was to 
bound the performance of a classical channel by means of 
auxiliary representations, that were in the end auxiliary pure- 
state classical-quantum channels, we expand it here to obtain 
an umbrella bound for a general classical-quantum channel by 
means of an auxiliary general classical-quantum channel. 

Given a classical-quantum channel C with density operators 
Si, S2, ■ ■ ■ , Sk and given a fixed p > 1, consider an auxiliary 
classical-quantum charmel C with states 81,82, ■■■ , 8k such 
that 

Tr J¥,JK < f Tr | v^v^jV (193) 



where |^| — \J A* A. We call such a channel an admissible 
auxiliary channel of degree p and we call r(p) the set of 
all such channels. For a fixed auxiliary channel, let E{R) be 
its reliability function and let Esp{R) be the sphere-packing 
bound. 

To any sequence of N input symbols w = (fci, ^2, . . . , /cat), 
we can associate a signal state Sw = 8ki ® 8k2 • • • (8 S'fe„ 
for the original channel and a signal state Sw = Sk-^ <E) 
Sk2 • • ■ "8) Skj^ for the auxiliary channel. Thus, to a set of M 
codewords Wi,...,wm, with a simplified notation we will 
associate vector states Si,...,Sm for the original channel 
and vector states Si,...,Sj\/ for the auxiliary channel. We 
will then bound the probability of error Pg max of the original 
channel C by bounding max of the auxiliary channel C. 

Consider the auxiliary channel codeword states. It was 
proved by Burnashev and Holevo [16] that for such a set of 
states, there exists a measurement with probability of error for 
the m-th message bounded as 



Re\m — 



This obviously implies that 



P. 



< [M - 1) max Tr A/S 



< e^^' max Tr\/Srn\/Sm'- 



(194) 

(195) 
(196) 



'^This is not obvious in general. For general channels, the function -Eo(p) 
is not necessarily concave, and this implies that Eap{Raa) may be obtained 
for a finite p. We conjecture, however, that Eo{p) is concave for pure state 
channels. 



max Tr 



> P. 



> e 



-NiE,p{R})+o{l})^-NR 



(197) 
(198) 



<Esp{R) + R + o{l). (199) 



and hence 

1 . , 1 

— mm log ; — = 

N m^m' TrVS^VS _ 
Considering the original states S„j, we deduce that 

P 



1 . 1 1 

— mm log 

A* m^m' Tr I vS„i 



<'^[E,piR) + R 



0(1). 

(200) 

The left hand side is precisely the minimum fidelity distance 



between codewords min„i^m/ dF(jn,'m'). Using Theorem 12 
and the property that dc{m,m') < 2dF{m,m'), we then 
deduce 

E{R)<p(E,j,{R) + Ry (201) 

For particular types of channels, this last step can be tightened. 
For example, if the original channel is a pairwise reversible 
classical channel, then we can use the relation dc{m,m') = 
dF{m,m') which holds in that case and thus state 

P 



E{R) < ^ (4p(P)+i? 



(202) 



In general, however, it would be better to change the definition 
of r(p) in order to take into account these types of particular 
cases. We have not yet investigated the topic in this direction 
and we thus only focus on the general case. 

Clearly, for any rate R, the parameter p and the auxiliary 
channel C can be chosen optimally. We thus have the following 
result. 

Theorem 11: The reliability function E{R) is upper 
bounded by the function Espu{R), where 



Espu{R) < inf p 
p>i,cer(p) 



EspiR) + R 



(203) 



A precise evaluation of this bound is not trivial. For a given 
R, one should find the optimal pair {p, C) and this is in general 
a complex task, which gives rise to interesting optimization 
problems. A complete treatment of this topic is still under 
investigation and will hopefully be detailed in a future work. 
It is important, however, to consider here the particular case 
of the bound used for classical channels by means of pure- 
state channels, in order to complete our interpretation of the 



umbrella bound given in Section III as a special case of this 



Suppose the states 8k commute, which means that the 
original channel is a classical one, and assume that we 
restrict the set of possible admissible auxiliary channels to 
the pure-state ones. First observe that for commuting states 
8k we have Tr|v^v^| = Tiv^v^ while, for pure 

states Sk = \ti)k){'ipk\, we have Ti \fWi\fSk = \{-ipi\'4ik)? ■ 
Hence, for a classical channel, the restriction of T{p) to pure 
state channels precisely corresponds to the set of admissible 



representations of degree p in the sense of Section III-B Then, 
for a fixed p and for a fixed auxiliary channel, it is interesting 
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to Study the values of R for which the bound is finite. This 
is of course the point Roo where Esp{R) diverges and, from 
our previous analysis, we have 

mill max log — = — (204) 



/ 



= — log min A, 



(205) 



We see that this is almost the same expression of the value of 
the representation V{{ipk}), and it is precisely the same value 
if for example (V'ilV'fc) > 0, Vi, k, as proved in theorem [o] 

It would be interesting to evaluate EspiRoo) but, as men- 
tioned in the previous section, this is not easy in general. By 
Theorem 10 however, we know that E{Roa) < ^oo- This 
implies that for R = i?oo, the right hand side of equation 
( |201[ ) is not larger than 2pRoo- Exploiting the fact that E{R) 



(206) 



which is to be compared with ( [40| ) with i?oo in place of d{p). 
Considering the expression \il2\ for i?oo and optimizing over 
the pure-state auxiliary channels, we thus see that the umbrella 



is surely non-increasing, we then deduce the bound 

E{R)<2pR^, i?>i?oo. 



bound bound derived in Section III is included as a particular 
case of the more general one derived in this section. In general, 
however, for a give representation, Esp{Roo) can be strictly 
smaller than Rao- Furthermore, the function Esp{R) has in 
many cases slope — oo in R^o and thus the bound of equation 
( |203[ ), for a given R, is in those cases not optimized for the 
value of p which leads to R^o = R, but for a larger one, which 
leads to i?oo < R, whenever possible. 

An interesting question at this point, which we have not 
yet been able to answer, is whether the optimal representation 
that minimizes i?oo always leads to Rao = or if Rao < ^ 
is possible. A partial result is given by Theorem |9j which 
tells us that in order to have Rao < some of the scalar 
products {ipi\ipk) must be negative. We have not yet obtained 
a conclusive answer to this question for the general case. 
We point out that even performing numerical experiments is 
not as simple as it may appear. While finding the optimal 
representation in terms of V{{tljk}), as mentioned in Section 
|VI| is a relatively simple semidefinite programming problem, 
finding the optimal representation in terms of Rao is a more 
comphcated task since it involves a quadratic constraint. 

IX. Extension of Some Classical Low Rate Bounds 
TO Classical-Quantum Channels 

As anticipated in the previous sections, the sphere-packing 
bound is in general not tight at low rates. For example, it 
is infinite over a non-empty range of positive small rates for 
all non trivial pure-state channels, even if there is no pair of 
orthogonal states in the channel input signal alphabet, which 
implies that the zero-error capacity of the channel is zero. In 
this Section, we deal precisely with channels with no zero- 
error capacity. For these channels, in the classical case some 
bounds that greatly improve the sphere-packing bound were 
derived and the main objective of this section is to consider 
some possible extensions of these results to the classical- 
quantum case. 



A first interesting result that extends a low rate bound 
from the classical to the classical-quantum setting was already 
obtained in [161 for the case of pure-state channels. There, the 
authors proved the equivalent of the zero-rate upper bound to 
E{R) derived in flOl for the case of pure-state channels with 
no zero-error capacity, thus proving that even in this case the 
expurgated bound is tight at zero rate. For general classical- 
quantum channels, a similar result was attempted in fT4], but 
the obtained lower bound to the probability of error at zero- 
rate does not coincide with the limiting value of the expurgated 
bound in this case. 

In this section, we first present the extension of this zero-rate 
upper bound lISTl . lITOl to the case of mixed-state channels, 
which leads to the determination of the exact value of the 
reliability function at zero-rate. Then, we also discuss some 
other bounds. In particular, we consider the application of 
Blahut's bound [381 to the case of pure and mixed-state 
channels. We analyze the relation between these two bounds 
also clarifying the domain of applicability of Blahut's bound 
in the classical case. 

A recurring theme in the study of the reliability of classical 
and classical-quantum channels is the fact that, at low rates, 
the probability of error is dominated by the worst pair of 
codewords in the code. At high rates, it is important to bound 
the probability of a message to be decoded in the wrong due to 
a bulk of competitors The auxiliary state f used in the proof 
of the sphere-packing bound serves precisely to this scope 
and represents this bulk of competitors. At low rates, instead, 
there are essentially only few competitors (we may conjecture 
just one) which are responsible for almost all the probability 
of error. Thus, in the low rate region, it is important to to 
bound the probability of error in a binary decision between 
any pair of codewords. For this reason, we need to specialize 
Theorem |4] to the case of binary hypothesis testing between 
two codewords, so as to obtain the quantum generalization of 
1 10, Th. 1]. 

In the context of Theorem |4] , thus, let g = Sx,„ and <^ — 
Sx,^^, and call for simplicity /i(s) = /iSx,„,Sx , (*) ■ Let s* 
minimize p{s). Then we have p'{s*) = and thus 



P.. 



1 

> - exp 



m(s*)-sVV(s*) 



p. 



> - exp 



pis*)-{l-s*)V2pns^) 



(207) 



(208) 



The key point is now to show that the second derivative term 
is unimportant for large A^, so that the exponential behaviour 
is determined by /i(s*). If Pi.fc(m, m') is the joint composition 
between codewords and Xm/,we find 



p{s) = jVy^y^Pi,fc(m,m')/i,^fc(s), 



where, for ease of notation. 



M.,fc(s)=logTr5i-^5^ 



(209) 



(210) 



By definition of s* we can thus introduce the Chernoff distance 
between the two messages m and m' (this corresponds to the 
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discrepancy D{m,m') in [10|) 
dc{m,m') = ^dc(Sx„, Sx^^,) 



o< 



(211) 

mill y^y^pi,k{m,m')fiik{s). (212) 



We then have the follwing generalization of |10 Th. 1]. 

Theorem 12: If w,„ and w,„/ are two codewords in a 
code of block length N for a classical-quantum channel with 
symbol states 5*1, ... , Sk, then 



or 



P. 



> ^ exp -N 



dc{m, m') 



dcim, m') 



logA^ 



(213) 



(214) 



where Amin is the smallest non-zero eigenvalue of the states 
Si, ... , Sk- 

Proof: The proof is essentially the same as in ifTol Th. 
1], with the only difference that we have to bound ii'l}.{s) 
for fJ,i,k{s) computed between density operators rather than 
probability distributions. Using the spectral decomposition of 
the density operators 

St = ^>^i,h\ipi,h){'4'i,h\ and Sk = ^Xk,r\i^k,r){lpk,r\- 
h r 

(215) 

we can however use again the Nussbaum-Szkola mapping to 
define two probability distributions Pi{h,r) and Pk{h,r) as 

Pi{h,r) = Xi,h\{i^i,h\tpk.r)\'^, Pk{h,r) = Xk.r\{-lpi.hHk.r)\'^ , 

(216) 

so that 



(217) 



h,r 

The proof in ifTOl Th. 1] can then be applied using our 
distributions Pi{-,-) and Pk{-,-) for P{-\i) and P{-\k) there, 
and noticing that in (W, eq. (1.10)] we can use in our case 
the bound 

Xi,h 



log 



Pt{h,r) 



PkiKr) 



log", 

Afe,r 

< logA^.^ 

— o "mm 



(218) 
(219) 



By considering all possible pairs of codewords, it is clear 
that the optimal exponential behaviour of Pe.max is governed 
by the minimum discrepancy between codewords Umin of an 
optimal code with M codewords and block length N . Theorem 
12 implies that the results on the zero-rate reliability of ifTOl 
Th. 3-4] apply straightforwardly to the classical-quantum case. 
These results are related to an upper bound to the reliability 
function in the low rate region derived by Blahut |38|. A 
clarification of this relation and of extensions to classical- 
quantum case is the objective of the next part of this section. 

In view of that, it is useful to remind here the relation 
between the Chemoff, the Bhattacharrya and the Uhllman 



distances discussed. If we define the Bhattacharyya and the 
Uhllman distances between two messages m and m' as 



dB{m,m') = ^'^pi^k{m,m')dBiSi,Sk) (220) 

i k 

dF{m,m') = ^^pi^k{m,m')dF{Si,Sk) (221) 

i k 

then we see that the following hold 

dp{m,m') < dsi'm-jm') < dc{m,m') 

<2dFim,m') <2dB{m,m'). (222) 

It is useful to investigate conditions under which dc(m, m') 
can be expressed exactly in terms of dB{m, m!) or dpim, m'). 
One case is the case of pairwise reversible channels as defined 
in ifTOl . that is, channels such that the function /li.fc(s) is 
minimized at s = 1/2 for all i and k. This condition is 
discussed for classical channels in |10|. For classical-quantum 
channels it holds for example for all pure-state channels, for 
which /ii,fc(s) is constant for all i and k. Another important 
case is the case where codewords m and m' have a symmetric 
joint composition, that is pi^k{m,m') — pk^i{m,m') for all 
i, k, for in that case the function fi(s) is symmetric around 
s = 1/2, due to the fact that Hi^kis) — ^k,i{^ ~ s). In 
those cases, the discrepancy dc{m,m') can be replaced by 
the closed form expression of dB{m, m'). In the general case, 
however, for a single pair of codewords, it is not possible to 
use dB{m, m!) in place of dc{m, m!) for lower bounding the 
probability of error, and it can be proved that in some cases 
dc{m,m') — 2dB{m,m'). 

We are now in a position to consider the low rate upper 
bounds to the reliability function derived in liTOl and in 
1 38 1 discussing their applicability for classical and classical- 
quantum channels. For classical channels, a low rate improve- 
ment of the sphere-packing bound for channels with no zero 
error capacity was obtained in |10|. This bound is based on 
two important results: 

1) A zero rate bound ifTOl Th 4], first derived by Berlekamp 
in his PhD thesis fST Ch. 2], which asserts that, for a 
discrete memoryless channel with transition probabilities 

E{0+) < max - ^ ^ p.,pk log ^ VnMnM- 

i k j 

(223) 

The right hand side of the above equation is also the value 
of the expurgated bound of Gallager as R ^ and this 
implies that the bound is tight and that the expression 
determines the reliability function at zero rate. 

2) A straight line bound iTOl Th. 6], attributed to Shannon 
and Gallager in Berlekamp's thesis fST pag. 6] which 
asserts that, given an upper bound Eir{R) to the relia- 
bility function which is tighter than Esp{R) at low rates, 
it is possible to combine Eir{R) ad Esp{R) to obtain an 
improved upper bound on E{R) by drawing a straight line 
from any two points on the curves Eir{R) and Esp{R). 
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Combining these two results, the authors obtain an upper 
bound to E{R) which is strictly better than the sphere-packing 
bound for low rates and is tight at rate R — 0. 

Of the above two results, we only consider here the ex- 
tension of the zero-rate bound, a possible generalization of 
the straight line bound being still under investigation at the 
moment. For quantum channels, a zero rate bound has been 
obtained by Burnashev and Holevo |13| for the case of pure- 
state channels which essentially parallel the classic result. That 
is, they proved that in the case of pure-states Sk = \ipk){'4'k\, 
if there is no pair of orthogonal states, then 



£^(0+) < max -^^pipk log \{ip^\ipk) 



(224) 



As for the classical case, this bound coincides with a lower 
bound given by the expurgated bound as i? — > 0, thus 
providing the exact expression. 

For general mixed-state channels, the reliability at zero rate 
was considered by Holevo lfT4ll who obtained the bound 

max - ^ ^ P.Pk log Tr y/siy/S~k < E{0+) 

i k 

< 2 max P^Pk I y^^y^l • (225) 

i k 

Note that, in light of now clear parallel role of the Renyi's 
divergence in the classical and quantum case, the lower bound 
is the generalization of p23| l, while the upper bound it in the 
general case a larger quantity. Thus, we are inclined to believe 
that the correct expression should be the first one. This is 
actually the case as stated in the following theorem. 

Theorem 13: For a general classical quantum channel with 
states Sk, k — 1, . . . ,K, no two of which are orthogonal, the 
reliability function at zero rate is given by the expression 



E{0+) = max - ^ P'Pf^ V 

i k 

Proof: This theorem is the quantum equivalent of fTO' 



(226) 



Th. 4] and it is a direct consequence of Theorem 12 It can 
be noticed, in fact, that the proof of ifTOl Th. 4] holds exactly 
unchanged in this new setting since it only depends on ifTOl 
Th. 1] and on the definition and additivity property of the 
function /i(s). We do not go through the proof here since it 
is very long and it does not need any change. 

■ 

It is interesting to briefly discuss the Holevo upper bound 
for -E(0+) for mixed-state channels, that is in the right hand 
side of ( |225| l. First observe that, in the classical case, that is 
when all states commute, the expression reduces to 

i?(0+ ) < 2 max "EE P^P^ E VpW)PUW)- 

i k j 

(227) 

This bound (in the classic case) is much easier to prove than 
Berlekamp's bound ( |223| l. First note that Berlekamp's bound 
is relatively simple to prove for the case of pairwise reversible 
channels, exploiting the fact that dc{iTi,m') — dB{iTL,m') in 
that case (see 1 10, Cor 3.1]). Essentially the same proof allows 
to derive the bound ( |224| i, since dc'{m,ni') = dB(m,m') 
also holds for pure-state channels as discussed in the previous 



section. For general classical channels, the same proof can be 
used by bounding dc{m,m') with 2(is(m,m') as explained 
before, and this leads to the bound P27[ ). For general classical- 
quantum channels, finally, Holevo's bound on the right hand 
side of ( |225| l is obtained with the same procedure by using 
the bound dc{m,m') < 2dp{m^m'). 

The proof of Berlekamp's bound ( |223| l and \226\ for gen- 
eral classical and classical-quantum channels, respectively, is 
instead more complicated (see the proof of |10, Th. 4]) and 
it relies heavily on the fact that the number of codewords M 
can be made as large as desired. Roughly speaking, it is a 
combinatorial result on possible joint compositions of pairs of 
codewords extracted from arbitrarily large sets. The interested 
reader can check that the truly remarkable result in the proof 
of [10, Th. 4] is a characterization of the joint compositions 
between codewords, which implies that the asymmetries of the 
functions /ii,fc(s) can be somehow "averaged" due to the many 
possible pairs of codewords that can be compared (see ifTOl 
eqs. (1.40)-(1.45), (1.54)] and observe that only the additivity 
of ^(s) and the fact that /i,^ ^(1/2) = -/x^. ,(1/2) is used). 

We can now consider the low-rate upper bound to the 
reliability function derived in lf38l . Blahut considers a class 
of channels called nonnegative definite channels as introduced 
by Jelinek ll37l . that is, channels for which the matrix with 
elements exp(a/ii.fc(l/2)) is nonnegative definite for all pos- 
itive values of a. These are precisely the same channels 



that we have discussed in Section III-C when explaining 



the connection of the umbrella bound with the expurgated 
bound. For these channels, Blahut fSF, Th. 8] shows that 
it is possible to related the smallest Bhattacharyya distance 
between codewords of a constant composition code for a given 
positive rate i? to a function Ejj{R) which he defines as 



Eu{R) = 



max mm 



J2J2P^Ph\kPr\k log J2 VPij\h)PU\r) 



where 



Ph 



(228) 



and /(p; P) is the mutual information between a variable with 
marginal p and another variable with conditional distribution 
P given the first. Then, in ll38l Sec. VI], Blahut derives an 
upper bound on E{R) by bounding the probability of error 
between codewords using the Bhattacharyya distance and thus 
deriving E{R) < Eu{R). It is the author opinion that the 
bound derived in f3F, Sec. VI] is not correct for general 
classical channels. A careful inspection indeed shows that, 
in the study of error exponents in binary hypothesis testing 
between codewords, Blahut essentially substitutes the Chemoff 
distance dc{m,m') between codewords m and m! with their 
Bhattacharyya distance (is(m, m'], invoking the fact that the 
words have the same composition^ If their joint composition 

'''in this author'.s opinion, tlie problem is in the fact that constant composi- 
tion of the codewords does not imply that the first term of the fourth equation 
in 1381 pag. 669] be zero, as stated in the paper 
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is not symmetric, however, it is not possible to assert that 
dcim^m') — dBirrijin') and it is actually possible to find 
examples of codewords of the same composition for which 
dc{m,m') w 2dB{m,Tn'). This situation can be approached 
as close as desired, for example, using the words '123' and 
'231' for a channel with the same structure of that in [10, Fig. 
3] and different parameters. Thus, in this author's opinion, the 
proof of 1 38 Th. 10] is only correct in the case of pairwise 
exchangeable channels and [38. Th. 12] only applies to such 
channels. This also clarifies the supposed tightness of Blahut's 
bound at zero-rate despite the fact that its proof if far simpler 
than Berlekamp's one. Indeed, while Berlekamp's bound holds 
for all channels, Blahut's bound would require a coefficient 
2 in the error exponent for general channels. As discussed 
previously, this is clarified by thinking that in Berlekamp's 
bound, the use of the Bhattacharryya distance in place of the 
Chernoff distance strongly relies on the fact that the number 
of possible codeword comparisons can be made as large as 
desired for N large enough. It is not possible to bound the 
probability of error between two codewords It is no possible 
in fact to bound the probability of error between any pair of 
codewords by is unbounded and cannot be applied only two 
codewords. 

In any case, Blahut's bound applies to nonnegative definite 
pairwise exchangeable channels, and it can be applied to 
general channels in the modified form E{R) < 2Eu{R). 
Bringing this into the classical-quantum setting, this implies 
that Blahut's bound can be applied as is for example to all non- 
negative definite pure-state channels and it can be applied to 
all non-negative channels with the correcting coefficient 2. If 
this coefficient makes the bound much looser than the straight 
line bound in the classical case, in the quantum case it results 
in the best known upper bound to E{R), since no straight line 
bound has been obtained yet. 

Another important observation is that, for the particular 
case of binary symmetric channels, the Chernoff distance 
between messages dc{m, m!) is proportional to the Hamming 
distance between the codewords d^f (x„i, x,„'). Thus, for a 
given R, any upper bound on the minimum Hamming distance 
between codewords is also an upper bound on the minimum 
Chernoff distance between messages, which easily implies a 
bound on E{R). In particular, the reliability function E{R) 
can be bounded using Elias bound on the minimum Hamming 
distance between binary codewords. The resulting bound is 
tighter than Eij{R) in the classical case as well as in the 
classical-quantum case. 

Finally, it is worth pointing out that, as opposed to the 
zero-rate bound, deriving a quantum version of the straight 
line bound seems to be a more complicated task, if even 
possible. The straight line bound in the classical case, indeed, 
is proved by exploiting the fact that a decoding decision can 
always be implemented in two steps by splitting the output 
sequence in two blocks, applying a list decoding on the first 
block and a low rate decoding on the second one. In the case 
of quantum channels, this procedure does not apply directly 
since the optimal measurements are in general entangled and 
are not equivalent to separable measurements. This topic is 
the objective of ongoing research. 



X. CONCLUSION AND FUTURE WORK 

In this paper, we have considered the problem of lower 
bounding the probability of error in coding for discrete mem- 
oryless classical and classical-quantum channels. A sphere- 
packing bound has been derived for the latter, and it was 
shown that this bound provides the natural framework for 
including Lovasz's work into the picture of classical bounds to 
the reliability function. An umbrella bound has been derived 
as a first example of use of the sphere-packing bound applied 
to auxiliary channels for bounding the reliability of channels 
with a zero-error capacity. Additional side results has been 
obtained showing that interesting connection exists between 
classical channels and pure-state channels. Future work will 
include an analysis of some of the bounds derived here and 
an attempt to improve them by means of known techniques 
already used with success in related works. There are (at least) 
three important questions that should be addressed by next 
works in this direction. The first is the possibility of finding 
a smooth connection between the sphere-packing bound and 
'dsp, as indicated in Fig. [2] The second is the possibility of 
including Haemers' bound to Co into the same picture, by 
expanding the theory if required, in order to obtain a bound 
to Co that is more general than both Lovasz's and Hamers' 
ones. The third important question to address is whether it is 
possible to extend the straight line bound to classical-quantum 
channels. This would give very good bounds in the low rate 
region for all channels without a zero-error capacity. 
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Appendix A 
Continuity of R{s, q, /s) 

For any s , < s < 1, and probability distribution p = 

{Pi,P2, ■ ■ ■ ,Pk}, let 

K 

a(s,p) = ^Pfc5^^ (229) 

k = l 

and call A{s) the convex set of all such operators when p 
varies over the simplex of probability distributions. Recall that 

= -logTra(s,q)i/(i~") (231) 

Let then be a choice of p that maximizes Eq{s/{1 — s), p), 
so that 

^^0 (t^-Ps ) =-log min TTa^/^^-'\ (232) 

Such a maximizing p^ must exists, since i?o(s/(l — s),p) is 
continuous on the compact set of probability distributions, but 
need not be unique. 
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Note now that the state fs as defined in ( |121| l is given by 

a(s,q,)i/(i-) 



Is 



(233) 



Tra(s,qs)i/(i-«)' 
Our aim is to prove that the function 

■R(s,P,/s) = -^Pk^^kjAs) - (1 - s)^?3fcAifcj^(s)(234) 



^ (1 - s) hY^Pky^ljS') + ^ log 8.(235) 



is continuous in s. We do this by proving that the density 
operator a{s, p^) is a continuous function of s, from which the 
continuity of fs follows, implying the continuity of jJ-kjA^) 
and its first two derivatives, by means of the Nussbaum-Szkola 
mapping and the relations ( |88| ) and ( (89] l when applied to states 
Sk and fs. 

First observe that, for fixed p, both a(s, p) and Eo{s/ (1 — 
s),p) are continuous function of s. Assume that a{s, Ps) is 
not continuous at the point s = s, where < s < 1. By 
definition, there exists a > and a sequence s„ such that 
s„ -s- s but ||q;(s,Pj) - a{sn, PsJWi > 5i. By picking an 
appropriate subsequence if necessary, assume without loss of 
generality that p^^ p so that a{sn,Ps„) = a{sn,p) + 
ei{n) = a{s,p) + e2{n) for some vanishing operators ei(n) 
and €2{n). Note that a(s, p) € A{s) and, by construction, 
||a(s, Pj) — a(s, p)||i > 6i. Since a{s,ps) is the choice of 
a that minimizes the function Tra^/*^^^'*^ over the convex 
domain A{s), since this function is strictly convex in that 
domain, and since a{s, q) e A{s) is bounded away from 
a{s,Ps) by a constant 6i, there exists a fixed positive 62 > 
such that 

Trais,py/^^-'^ > Tra{s,psy/^^-'^ + 62. (236) 
But then, 

Tra(s„,p,Ji/(i-^") = Tr(a(s,p) + e2(n))^/(^-^") 

> Tra(s,p)i/(i-"") 

> Tra{s,pf/^^-'^-e2{n) 

> Tra(s,pg)i/(i-^T+^2-e2W 



> Tr a{sn, Ps 



,1/(1-Sn) 



where ei{n), 83(11) are vanishing positive functions of 
n. This implies that ps^ is not optimal for n large enough, 
contrarily to the assumed hypothesis. 

Appendix B 
Historical note 

Since the results of this paper are essentially based on the 
original proof of the sphere-packing bound as given in l9\, 
we believe some historical comments on that proof may be of 
interest for the reader. It is important to point out, in fact, that 
even if |9| contains the first formal proof of this result, the 
bound itself had already been accepted before, at least among 
information theorists at the MIT. 

The main idea behind the sphere-packing bound was Shan- 
non's 1521 . Elias proved the bound for the binary symmetric 
channel in 1955 0, and the bound for general DMC was first 



stated by Fano IS] Ch. 9] as an attempt to generalize Elias's 
ideas to the non-binary case. Fano's proof, however, was 
not competely rigorous, although it was correct with respect 
to the elaboration of the many complicated and "subtle" 
equations that allowed him to obtain the resulting expression 
for the first time. Fano's approach already contained the main 
idea of considering a binary hypothesis test between some 
appropriately chosen codewords and a dummy output distri- 
bution, and his procedure allowed him to solve the resulting 
minmax optimization problem with a direct approach which, 
in this author's opinion, could be defined "tedious" but not 
"unenlightening" |9 pag. 91], and which opened in any case 
the way that allowed to subsequently obtain the formal proof 
later on. 

It is not easy to precisely understand, from the published 
papers, when the formal proof was subsequently obtained and 
by whom. In fact, even if first published in the mentioned 1967 
paper |i9J, the result must have been somehow accepted before, 
at least at the MIT, since Berlekamp mentions the bound and 
the main properties of the function Esp{R) in an overview of 
the known results on the reliability function in his 1964 PhD 
thesis ifsTl Ch. 1: Historical Backgound], with references only 
to (53], fSl and fTS) "and others". Gallager, as well, mentions 
the bound in his 1965 paper ifTSll attributing the "statement" 
of the bound to Fano (see Section I and eq. (44)). Note also 
that Fano and Gallager do not call it "sphere-packing bound" 
while Berlekamp does in his thesis. It is worth pointing out that 
many results were not published by their authors at that time, 
see for example the Elias bound for the minimum distance 
of binary codes, which is described in |10|. Analogously, 
in the introduction to |8 Ch. 9], Fano credits Shannon for 
previous derivation, in some unpublished notes, of parts of 
the results therein. It is known that Shannon was still very 
productive during the '60s ||54I and had many unpublished 
results on discrete memoryless channels 11551 . It is then not 
immediately clear which parts of the ideas used in the proof of 
the sphere-packing bound were already known at an empirical 
level among MIT's information theorists. Furthermore even if 
the resulting expression was stated first by Fano, finding the 
rigorous proof required a reconsideration of Elias' original 
work for the binary case 1521 . 

Finally, an important comment concerns the bound for 
constant composition codes with non-optimal composition. It 
is worth pointing out that, while Fano's version of the sphere- 
packing bound includes the correct tight expression for the 
case of fixed composition codes with general non-optimal 
composition, the version given in Shannon-Gallger-Berlekamp 
does not consider this case, and the bound is tight only for the 
optimal composition. The reader may also note that it is not 
even possible to simply remove the optimization over p in that 
bound, using Eo{p, p) in place of Eo{p), since it can be proved 
that constant composition codes with non optimal composition 
p achieve an exponent strictly larger than Eq{p,p) at those 
rates where the maximizing p in the definition of Esp{R) is 
less than one |24|. 
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